Severalnines

It is sometimes inevitable to run MySQL database servers on a public or exposed network. This is a common setup in a shared hosting environment, where a server is configured with multiple services and often running within the same server as the database server. For those who have this kind of setup, you should always have some kind of protection against cyberattacks like denial-of-service, hacking, cracking, data breaches; all which can result in data loss. These are things that we always want to avoid for our database server.

Here are some of the tips that we can do to improve our MySQL or MariaDB security.

Scan Your Database Servers Regularly

Protection against any malicious files in the server is very critical. Scan the server regularly to look for any viruses, spywares, malwares or rootkits especially if the database server is co-located with other services like mail server, HTTP, FTP, DNS, WebDAV, telnet and so on. Commonly, most of the database hacked issues originated from the application tier that is facing the public network. Thus, it's important to scan all files, especially web/application files since they are one of the entry points to get into the server. If those are compromised, the hacker can get into the application directory, and have the ability to read the application files. These might contain sensitive information, for instance, the database login credentials.

ClamAV is one of the most widely known and widely trusted antivirus solutions for a variety of operating systems, including Linux. It's free and very easy to install and comes with a fairly good detection mechanism to look for unwanted things in your server. Schedule periodic scans in the cron job, for example:

0 3 * * * /bin/freshclam ; /bin/clamscan / --recursive=yes -i > /tmp/clamav.log ; mail -s clamav_log_`hostname` monitor@mydomain.local < /tmp/clamav.log

The above will update the ClamAV virus database, scan all directories and files and send you an email on the status of the execution and report every day at 3 AM.

Use Stricter User Roles and Privileges

When creating a MySQL user, do not allow all hosts to access the MySQL server with wildcard host (%). You should scan your MySQL host and look for any wildcard host value, as shown in the following statement:

mysql> SELECT user,host FROM mysql.user WHERE host = '%';

+---------+------+

| user    | host |

+---------+------+

| myadmin | %    |

| sbtest  | % |

| user1   | % |

+---------+------+

From the above output, strict or remove all users that have only '%' value under Host column. Users that need to access the MySQL server remotely can be enforced to use SSH tunnelling method, which does not require remote host configuration for MySQL users. Most of the MySQL administration clients such as MySQL Workbench and HeidiSQL can be configured to connect to a MySQL server via SSH tunelling, therefore it's possible to completely eliminate remote connection for MySQL users.

Also, limit the SUPER privilege to only users from localhost, or connecting via UNIX socket file. Be more cautious when assigning FILE privilege to non-root users since it permits read and write files on the server using theLOAD DATA INFILE and SELECT ... INTO OUTFILE statements. Any user to whom this privilege is granted can also read or write any file that the MySQL server can read or write.

Change the Database Default Settings

By moving away from the default setup, naming and configurations, we can reduce the attack vector to a number of folds. The following actions are some examples on default configurations that DBAs could easily change but commonly overlooked related to MySQL:

Change default MySQL port to other than 3306.
Rename the MySQL root username to other than "root".
Enforce password expiration and reduce the password lifetime for all users.
If MySQL is co-located with the application servers, enforce connection through UNIX socket file only, and stop listening on port 3306 for all IP addresses.
Enforce client-server encryption and server-server replication encryption.

We actually have covered this in detail in this blog post, How to Secure MySQL/MariaDB Servers.

Setup a Delayed Slave

A delayed slave is just a typical slave, however the slave server intentionally executes transactions later than the master by at least a specified amount of time, available from MySQL 5.6. Basically, an event received from the master is not executed until at least N seconds later than its execution on the master. The result is that the slave will reflect the state of the master some time back in the past.

A delayed slave can be used to recover data, which would be helpful when the problem is found immediately, within the period of delay. Suppose we configured a slave with a 6-hour delay from the master. If our database were modified or deleted (accidentally by a developer or deliberately by a hacker) within this time range, there is a possibility for us to revert to the moment right before it happened by stopping the current master, then remove the delay on the slave server up until certain point with the following command:

START SLAVE UNTIL MASTER_LOG_FILE='xxxxx', MASTER_LOG_POS=yyyyyy;

And then promote the slave to become the new master. This method is probably the fastest way to recover your MySQL database in production environment without having to reload a backup. Having a number of delayed slaves with different length durations, as shown in this blog, Multiple Delayed Replication Slaves for Disaster Recovery with Low RTO on how to set up a cost-effective delayed replication servers on top of Docker containers.

Enable Binary Logging

Binary logging is generally recommended to be enabled even though you are running on a standalone MySQL/MariaDB server. The binary log contains information about SQL statements that modify database contents. The information is stored in the form of “events” that describe the modifications. Despite performance impact, having binary log allows you to have the possibility to replay your database server to the exact point where you want it to be restored, also known as point-in-time recovery (PITR). Binary logging is also mandatory for replication.

With binary logging enabled, one has to include the binary log file and position information when taking up a full backup. For mysqldump, using the --master-data flag with value 1 or 2 will print out the necessary information that we can use as a starting point to roll forward the database when replaying the binary logs later on.

With binary logging enabled, you can use another cool recovery feature called flashback, which is described in the next section.

Enable Flashback

The flashback feature is available in MariaDB, where you can restore back the data to the previous snapshot in a MySQL database or in a table. Flashback uses the mysqlbinlog to create the rollback statements and it needs a FULL binary log row image for that. Thus, to use this feature, the MySQL/MariaDB server must be configured with the following:

[mysqld]

...

binlog_format = ROW

binlog_row_image = FULL

The following architecture diagram illustrates how flashback is configured on one of the slave:

To perform the flashback operation, firstly you have to determine the date and time when you want to "see" the data, or binary log file and position. Then, use the --flashback flag with mysqlbinlog utility to generate SQL statements to rollback the data to that point. In the generated SQL file, you will notice that the DELETE events are converted to INSERTs and vice versa, and also it swaps WHERE and SET parts of the UPDATE events.

The following command line should be executed on the slave2 (configured with binlog_row_image=FULL):

$ mysqlbinlog --flashback --start-datetime="2020-02-17 01:30:00"  /var/lib/mysql/mysql-bin.000028 -v --database=shop --table=products > flashback_to_2020-02-17_013000.sql

Then, detach slave2 from the replication chain because we are going to break it and use the server to rollback our data:

mysql> STOP SLAVE;

mysql> RESET MASTER;

mysql> RESET SLAVE ALL;

Finally, import the generated SQL file into the MariaDB server for database shop on slave2:

$ mysql -u root -p shop < flashback_to_2020-02-17_013000.sql

When the above is applied, the table "products" will be at the state of 2020-02-17 01:30:00. Technically, the generated SQL file can be applied to both MariaDB and MySQL servers. You could also transfer the mysqlbinlog binary from MariaDB server so you can use the flashback feature on a MySQL server. However, MySQL GTID implementation is different than MariaDB thus restoring the SQL file requires you to disable MySQL GTID.

A couple of advantages using flashback is you do not need to stop the MySQL/MariaDB server to carry out this operation. When the amount of data to revert is small, the flashback process is much faster than recovering the data from a full backup.

Log All Database Queries

General log basically captures every SQL statement being executed by the client in the MySQL server. However, this might not be a popular decision on a busy production server due to the performance impact and space consumption. If performance matters, binary log has the higher priority to be enabled. General log can be enabled during runtime by running the following commands:

mysql> SET global general_log_file='/tmp/mysql.log'; 

mysql> SET global log_output = 'file';

mysql> SET global general_log = ON;

You can also set the general log output to a table:

mysql> SET global log_output = 'table';

You can then use the standard SELECT statement against the mysql.general_log table to retrieve queries. Do expect a bit more performance impact when running with this configuration as shown in this blog post.

Otherwise, you can use external monitoring tools that can perform query sampling and monitoring so you can filter and audit the queries that come into the server. ClusterControl can be used to collect and summaries all your queries, as shown in the following screenshots where we filter all queries that contain DELETE string:

Similar information is also available under ProxySQL's top queries page (if your application is connecting via ProxySQL):

This can be used to track recent changes that have happened to the database server and can also be used for auditing purposes.

Conclusion

Your MySQL and MariaDB servers must be well-protected at all times since it usually contains sensitive data that attackers are looking after. You may also use ClusterControl to manage the security aspects of your database servers, as showcased by this blog post, How to Secure Your Open Source Databases with ClusterControl.

Tags:

data security

database security

security

ddos

MySQL

MariaDB

Replication lag issues in PostgreSQL is not a widespread issue for most setups. Although, it can occur and when it does it can impact your production setups. PostgreSQL is designed to handle multiple threads, such as query parallelism or deploying worker threads to handle specific tasks based on the assigned values in the configuration. PostgreSQL is designed to handle heavy and stressful loads, but sometimes (due to a bad configuration) your server might still go south.

Identifying the replication lag in PostgreSQL is not a complicated task to do, but there are a few different approaches to look into the problem. In this blog, we'll take a look at what things to look at when your PostgreSQL replication is lagging.

Types of Replication in PostgreSQL

Before diving into the topic, let's see first how replication in PostgreSQL evolves as there are diverse set of approaches and solutions when dealing with replication.

Warm standby for PostgreSQL was implemented in version 8.2 (back in 2006) and was based on the log shipping method. This means that the WAL records are directly moved from one database server to another to be applied, or simply an analogous approach to PITR, or very much like what you are doing with rsync.

This approach, even old, is still used today and some institutions actually prefer this older approach. This approach implements a file-based log shipping by transferring WAL records one file (WAL segment) at a time. Though it has a downside; A major failure on the primary servers, transactions not yet shipped will be lost. There is a window for data loss (you can tune this by using the archive_timeout parameter, which can be set to as low as a few seconds, but such a low setting will substantially increase the bandwidth required for file shipping).

In PostgreSQL version 9.0, Streaming Replication was introduced. This feature allowed us to stay more up-to-date when compared to file-based log shipping. Its approach is by transferring WAL records (a WAL file is composed of WAL records) on the fly (merely a record based log shipping), between a master server and one or several standby servers. This protocol does not need to wait for the WAL file to be filled, unlike file-based log shipping. In practice, a process called WAL receiver, running on the standby server, will connect to the primary server using a TCP/IP connection. In the primary server, another process exists named WAL sender. It's role is in charge of sending the WAL registries to the standby server(s) as they happen.

Asynchronous Replication setups in streaming replication can incur problems such as data loss or slave lag, so version 9.1 introduces synchronous replication. In synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. This method minimizes the possibility of data loss, as for that to happen we will need for both, the master and the standby to fail at the same time.

The obvious downside of this configuration is that the response time for each write transaction increases, as we need to wait until all parties have responded. Unlike MySQL, there's no support such as in a semi-synchronous environment of MySQL, it will failback to asynchronous if timeout has occurred. So in With PostgreSQL, the time for a commit is (at minimum) the round trip between the primary and the standby. Read-only transactions will not be affected by that.

As it evolves, PostgreSQL is continuously improving and yet its replication is diverse. For example, you can use physical streaming asynchronous replication or use logical streaming replication. Both are monitored differently though use the same approach when sending data over replication, which is still streaming replication. For more details check in the manual for different types of solutions in PostgreSQL when dealing with replication.

Causes of PostgreSQL Replication Lag

As defined in our previous blog, a replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node.

Since PostgreSQL is using streaming replication, it's designed to be fast as changes are recorded as a set of sequence of log records (byte-by-byte) as intercepted by the WAL receiver then writes these log records to the WAL file. Then the startup process by PostgreSQL replays the data from that WAL segment and streaming replication begins. In PostgreSQL, a replication lag can occur by these factors:

Network issues
Not able to find the WAL segment from the primary. Usually, this is due to the checkpointing behavior where WAL segments are rotated or recycled
Busy nodes (primary and standby(s)). Can be caused by external processes or some bad queries caused to be a resource intensive
Bad hardware or hardware issues causing to take some lag
Poor configuration in PostgreSQL such as small numbers of max_wal_senders being set while processing tons of transaction requests (or large volume of changes).

What To Look for With PostgreSQL Replication Lag

PostgreSQL replication is yet diverse but monitoring the replication health is subtle yet not complicated. In this approach we'll showcase are based on a primary-standby setup with asynchronous streaming replication. The logical replication cannot benefit most of the cases we're discussing here but the view pg_stat_subscription can help you collect information. However, we'll not focus on that in this blog.

Using pg_stat_replication View

The most common approach is to run a query referencing this view in the primary node. Remember, you can only harvest information from the primary node using this view. This view contains the following table definition based on PostgreSQL 11 as shown below:

postgres=# \d pg_stat_replication

                    View "pg_catalog.pg_stat_replication"

      Column      | Type           | Collation | Nullable | Default 

------------------+--------------------------+-----------+----------+---------

 pid              | integer           | | | 

 usesysid         | oid           | | | 

 usename          | name           | | | 

 application_name | text                     | | | 

 client_addr      | inet           | | | 

 client_hostname  | text           | | | 

 client_port      | integer           | | | 

 backend_start    | timestamp with time zone |           | | 

 backend_xmin     | xid           | | | 

 state            | text           | | | 

 sent_lsn         | pg_lsn           | | | 

 write_lsn        | pg_lsn           | | | 

 flush_lsn        | pg_lsn           | | | 

 replay_lsn       | pg_lsn           | | | 

 write_lag        | interval           | | | 

 flush_lag        | interval           | | | 

 replay_lag       | interval           | | | 

 sync_priority    | integer           | | | 

 sync_state       | text           | | |

Where the fields are defined as (includes PG < 10 version),

pid: Process id of walsender process
usesysid: OID of user which is used for Streaming replication.
username: Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr: Address of standby/streaming replication
client_hostname: Hostname of standby.
client_port: TCP port number on which standby communicating with WAL sender
backend_start: Start time when SR connected to Master.
backend_xmin: standby's xmin horizon reported by hot_standby_feedback.
state: Current WAL sender state i.e streaming
sent_lsn/sent_location: Last transaction location sent to standby.
write_lsn/write_location: Last transaction written on disk at standby
flush_lsn/flush_location: Last transaction flush on disk at standby.
replay_lsn/replay_location: Last transaction flush on disk at standby.
write_lag: Elapsed time during committed WALs from primary to the standby (but not yet committed in the standby)
flush_lag: Elapsed time during committed WALs from primary to the standby (WAL's has already been flushed but not yet applied)
replay_lag: Elapsed time during committed WALs from primary to the standby (fully committed in standby node)
sync_priority: Priority of standby server being chosen as synchronous standby
sync_state: Sync State of standby (is it async or synchronous).

A sample query would look as follows in PostgreSQL 9.6,

paultest=# select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------

pid              | 7174

usesysid         | 16385

usename          | cmon_replication

application_name | pgsql_1_node_1

client_addr      | 192.168.30.30

client_hostname  | 

client_port      | 10580

backend_start    | 2020-02-20 18:45:52.892062+00

backend_xmin     | 

state            | streaming

sent_location    | 1/9FD5D78

write_location   | 1/9FD5D78

flush_location   | 1/9FD5D78

replay_location  | 1/9FD5D78

sync_priority    | 0

sync_state       | async

-[ RECORD 2 ]----+------------------------------

pid              | 7175

usesysid         | 16385

usename          | cmon_replication

application_name | pgsql_80_node_2

client_addr      | 192.168.30.20

client_hostname  | 

client_port      | 60686

backend_start    | 2020-02-20 18:45:52.899446+00

backend_xmin     | 

state            | streaming

sent_location    | 1/9FD5D78

write_location   | 1/9FD5D78

flush_location   | 1/9FD5D78

replay_location  | 1/9FD5D78

sync_priority    | 0

sync_state       | async

This basically tells you what blocks of location in the WAL segments that have been written, flushed, or applied. It provides you a granular overlook of the replication status.

Queries to Use In the Standby Node

In the standby node, there are functions that are supported for which you can mitigate this into a query and provide you the overview of your standby replication's health. To do this, you can run the following query below (query is based on PG version > 10),

postgres=#  select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_is_wal_replay_paused       | f

pg_last_wal_receive_lsn       | 0/2705BDA0

pg_last_wal_replay_lsn        | 0/2705BDA0

pg_last_xact_replay_timestamp | 2020-02-21 02:18:54.603677+00

In older versions, you can use the following query:

postgres=# select pg_is_in_recovery(),pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_last_xlog_receive_location | 1/9FD6490

pg_last_xlog_replay_location  | 1/9FD6490

pg_last_xact_replay_timestamp | 2020-02-21 08:32:40.485958-06

What does the query tell? Functions are defined accordingly here,

pg_is_in_recovery(): (boolean) True if recovery is still in progress.
pg_last_wal_receive_lsn()/pg_last_xlog_receive_location(): (pg_lsn) The write-ahead log location received and synced to disk by streaming replication.
pg_last_wal_replay_lsn()/pg_last_xlog_replay_location(): (pg_lsn) The last write-ahead log location replayed during recovery. If recovery is still in progress this will increase monotonically.
pg_last_xact_replay_timestamp(): (timestamp with time zone) Get timestamp of last transaction replayed during recovery.

Using some basic math, you can combine these functions. The most common used function that are used by DBA's are,

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

or in versions PG < 10,

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

Although this query has been in-practice and is used by DBA's. Still, it doesn't provide you an accurate view of the lag. Why? Let's discuss this in the next section.

Identifying Lag Caused by WAL Segment's Absence

PostgreSQL standby nodes, which are in recovery mode, does not report to you the exact state of what's happening of your replication. Not unless you view the PG log, you can gather information of what's going on. There's no query you can run to determine this. In most cases, organizations and even small institutions come up with 3rd party softwares to let them be alerted when an alarm is raised.

One of these is ClusterControl, which offers you observability, sends alerts when alarms are raised or recovers your node in case of disaster or catastrophe happens. Let's take this scenario, my primary-standby async streaming replication cluster has failed. How would you know if something's wrong? Let's combine the following:

Step 1: Determine if There's a Lag

postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

postgres-# THEN 0

postgres-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

postgres-# END AS log_delay;

-[ RECORD 1 ]

log_delay | 0

Step 2: Determine the WAL Segments Received From the Primary and Compare with Standby Node

## Get the master's current LSN. Run the query below in the master

postgres=# SELECT pg_current_wal_lsn();

-[ RECORD 1 ]------+-----------

pg_current_wal_lsn | 0/925D7E70

For older versions of PG < 10, use pg_current_xlog_location.

## Get the current WAL segments received (flushed or applied/replayed)

postgres=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_is_wal_replay_paused       | f

pg_last_wal_receive_lsn       | 0/2705BDA0

pg_last_wal_replay_lsn        | 0/2705BDA0

pg_last_xact_replay_timestamp | 2020-02-21 02:18:54.603677+00

That seems to look bad.

Step 3: Determine How Bad it Could Be

Now, let's mix the formula from step #1 and in step #2 and get the diff. How to do this, PostgreSQL has a function called pg_wal_lsn_diff which is defined as,

pg_wal_lsn_diff(lsn pg_lsn, lsn pg_lsn) / pg_xlog_location_diff (location pg_lsn, location pg_lsn): (numeric) Calculate the difference between two write-ahead log locations

Now, let's use it to determine the lag. You can run it in any PG node, since it's we'll just provide the static values:

postgres=# select pg_wal_lsn_diff('0/925D7E70','0/2705BDA0');                                                                                                                                     -[ RECORD 1 ]---+-----------

pg_wal_lsn_diff | 1800913104

Let's estimate how much is 1800913104, that seems to be about 1.6GiB might have been absent in the standby node,

postgres=# select round(1800913104/pow(1024,3.0),2) missing_lsn_GiB;

-[ RECORD 1 ]---+-----

missing_lsn_gib | 1.68

Lastly, you can proceed or even prior to the query look at the logs like using tail -5f to follow and check what's going on. Do this for both primary/standby nodes. In this example, we'll see that it has a problem,

## Primary

root@debnode4:/var/lib/postgresql/11/main# tail -5f log/postgresql-2020-02-21_033512.log

2020-02-21 16:44:33.574 UTC [25023] ERROR:  requested WAL segment 000000030000000000000027 has already been removed

...



## Standby

root@debnode5:/var/lib/postgresql/11/main# tail -5f log/postgresql-2020-02-21_014137.log 

2020-02-21 16:45:23.599 UTC [26976] LOG:  started streaming WAL from primary at 0/27000000 on timeline 3

2020-02-21 16:45:23.599 UTC [26976] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000030000000000000027 has already been removed

...

When encountering this issue, it's better to rebuild your standby nodes. In ClusterControl, it's as easy as one click. Just go to the Nodes/Topology section, and rebuild the node just like below:

Other Things to Check

You can use the same approach in our previous blog (in MySQL), using system tools such as ps, top, iostat, netstat combination. For example, you can also get the current recovered WAL segment from the standby node,

root@debnode5:/var/lib/postgresql/11/main# ps axufwww|egrep "postgre[s].*startup"

postgres  8065 0.0 8.3 715820 170872 ?       Ss 01:41 0:03 \_ postgres: 11/main: startup   recovering 000000030000000000000027

How Can ClusterControl Help?

ClusterControl offers an efficient way on monitoring your database nodes from primary to the slave nodes. When going to the Overview tab, you have already the view of your replication health:

Basically, the two screenshots above displays how's the replication health and what's the current WAL segments. That's not at all. ClusterControl also shows the current activity of what's going on with your Cluster.

Conclusion

Monitoring the replication health in PostgreSQL can end up on a different approach as long as you are able to meet your needs. Using third party tools with observability that can notify you in case of catastrophe is your perfect route, whether an open source or enterprise. The most important thing is, you have your disaster recovery plan and business continuity planned ahead of such trouble.

Tags:

We are looking for a highly motivated, experienced Senior Engineering Manager to work closely with product management and business stakeholders.

Our microservices backend architecture and cloud platform provides an infrastructure to launch new innovative cloud services for the database monitoring and management market.

You will lead a team of developers (frontend & backend) on our next generation cloud services and the on-premise product ClusterControl which is the leading monitoring and management application for open source databases and used by thousands of companies.

As a key leader on our team, you'll play a crucial role in maintaining and evolving Severalnines engineering culture and coaching individuals to achieve their very best.

This is a team leadership position.

Key Job Responsibilities

Manage a team of developers responsible for Severalnines products
Own your team’s technical strategy and roadmap in close collaboration with Product, Design, and Operations
Be a visionary and propose, drive improvements to our cloud services and ClusterControl
Work with your team to help understand requirements, evaluate new features and architecture and help drive decisions
Collaborate with CTO and Product Management to contribute to the strategic direction of the company.
Build and foster a high performance culture, mentor team members and provide your team with the tools and motivation to make things happen
Attend daily standup meetings, manage releases and patch builds, collaborate with customer support

Essential Qualifications

Previously managed teams of engineers with ownership of high scale, cloud based production systems
Proven experience managing fully remote technical teams and resolve complex and ambiguous issues both from a business and technical perspective
Track record of delivering projects and collaborating with leaders across the organization: including CTO, Product, Design, and Operations
Prior domain experience working with cloud services and/or database management solutions
Experienced in one or more of the tech stack and tools in use today within the team that includes AWS, Google Cloud, Jenkins/TeamCity CI/CD, Atlassian tools (JIRA/Confluence), gRPC, Go, React, AngularJS, C++, Open-source DBs
Expert experience with development processes and methodologies such as Kanban or Scrum
Expertise and passion for developing others through regular coaching and mentoring, 1:1s, and performance reviews
Excellent communication and written skills
A hands-on attitude and a willingness to get things done

Additional Details

We’re a software startup with a globally distributed team. As we don’t have any offices, you will be working remotely; in other words, we all work from our homes. Therefore, a good laptop and access to a reliable, high-speed Internet connection are required. The company is based in Europe and although no travel is required, we do encourage attendance at our annual meeting which rotates cities. Contact us to find out more about the working life at Severalnines!

When choosing a NoSQL database technology important considerations should be taken into account, such as performance, resilience, reliability, and security. These key factors should also be aligned with achieving business goals, at least as far as the database is concerned.

Many technologies have come into play to improve these aspects, and it is advisable for an organisation to improve the salient options and try integrating them into the database systems.

New technologies should ensure maximum performance to enhance achievement of business goals at an affordable operating cost but with more manipulative features such as error detection and alerting systems.

In this blog we will discuss the Percona version of MongoDB and how it expands the power of MongoDB in a variety of ways.

What is Percona Server for MongoDB?

For a database to perform well, there must be an optimally established underlying server for enhancing read and write transactions. Percona Server for MongoDB is a free open-source drop-in replacement for the MongoDB Community Edition, but with additional enterprise-grade functionality. It is designed with some major improvements on the default MongoDB server setup.

It delivers high performance, improved security, and reliability for optimum performance with reduced expenditure on proprietary software vendor relationships.

Percona Server for MongoDB Salient Features

MongoDB Community Edition is core to the Percona server considering that it already constitutes important features such as the flexible schema, distributed transactions, familiarity of the JSON documents and native high availability. Besides this, Percona Server for MongoDB integrates the following salient features that enables it to satisfy the aspects we have mentioned above:

Hot Backups
Data at rest encryption
Audit Logging
Percona Memory Engine
External LDAP Authentication with SASL
HashiCorp Vault Integration
Enhanced query profiling

Hot Backups

Percona server for MongoDB creates a physical data backup on a running server in the background without any noticeable operation degradation. This is achievable by running the createBackup command as an administrator on the admin database and specifying the backup directory.

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

When you receive {"ok" : 1 } then the backup was successful. Otherwise, if for example you specify an empty backup directory path, you may receive an error response i.e:

{ "ok" : 0, "errmsg" : "Destination path must be absolute" }

Restoring the backup requires one to first stop the mongod instance, clean the data directory, copy the files from the directory and then restart the mongod service. This can be done by running the command below

$ service mongod stop && rm -rf /var/lib/mongodb/* && cp --recursive /my/backup/data/path /var/lib/mongodb/ && service mongod start

You can also store the backup in archive format if using Percona server for MongoDB 4.2.1-1

> use admin

> db.runCommand({createBackup: 1, archive: "path/to/archive.tar" })

You can also backup directly to AWS S3 using the default settings or with more configurations. For a default S3 bucket backup:

> db.runCommand({createBackup: 1, s3: {bucket: "backup", path: "newBackup"}})

Data-at-Rest Encryption

MongoDB version 3.2 introduced data at rest encryption for the WiredTiger storage engine towards ensuring that data files can be decrypted and read by parties with decryption key only. Data encryption at rest in Percona Server for MongoDB was introduced in version 3.6 to go in hand with data encryption at rest interface in MongoDB. However the latest version does not include support for Amazon AWS and KIMP key management services.

The encryption can also be applied to rollback files when data at rest is enabled. Percona Server for MongoDB uses encryptionCipherMode option with 2 selective cipher modes:

AES256-CBC (default cipher mode)
AES256-GCM

You can encrypt data with the command below

$ mongod ... --encryptionCipherMode or 

$ mongod ... --encryptionCipherMode AES256-GCM

We use the --ecryptionKeyFile option to specify the path to a file that contains the encryption key.

$ mongod ... --enableEncryption --encryptionKeyFile <fileName>

Audit Logging

For every database system, administrators have a mandate to keep track on activities taking place. In Percona Server for MongoDB, when auditing is enabled, the server generates an audit log file than constitutes information about different user events such as authorization and authentication. However, starting the server with auditing enabled, the logs won’t be displayed dynamically during runtime.

The Audit Logging in MongoDB Community Edition can take two data formats that is, JSON and BSON. However, for Percona Server for MongoDB, audit logging is limited to JSON file only. The server also logs only important commands contrary to MongoDB that logs everything. Since the filtering procedure in Percona is so unclear in terms of the filtering syntax, enabling the audit log without filtering would offer more entries from which one can narrow down to own specifications.

Percona Memory Engine

This is a special configuration of the WiredTiger storage engine that does not store user data on disk. The data fully resides and is readily available in the main memory except for diagnostic data that is written to disk. This makes data processing much faster but with a consideration that you must ensure there is enough memory to hold the data set and the server should not shut down. One can select a storage engine to use with the --storageEngine command. Data created for one storage engine cannot be compatible with other storage engines because each storage engine has its own data model. For instance to select the in-memory storage engine. You first stop any running mongod instance and then issue the commands:

$ service mongod stop

$ mongod --storageEngine inMemory --dbpath <newDataDir>

If you already have some data with your default MongoDB Community edition and you would like to migrate to Percona Memory Engine, just use the mongodumb and mongorestore utilities by issuing the command:

$ mongodump --out <dumpDir>

$ service mongod stop

$ rm -rf /var/lib/mongodb/*

$ sed -i '/engine: .*inMemory/s/#//g' /etc/mongod.conf

$ service mongod start

$ mongorestore <dumpDir>

External LDAP Authentication With SASL

Whenever clients make either a read or write request to MongoDB mongod instance, they need to authenticate against the MongoDB server user database first. The external authentication allows the MongoDB server to verify the client credentials (username and password) against a separate service. The external authentication architecture involves:

LDAP Server which remotely stores all user credentials
SASL Daemon that is used as a MongoDB server-local proxy for the remote LDAP service.
SASL Library: creates necessary authentication data for MongoDB client and server.

Authentication session sequence

The Client gets connected to a running mongod instance and creates a PLAIN authentication request using the SASL library.
The auth request is then sent to the server as a special Mongo command which is then received by the mongod server with its request payload.
The server creates some SASL sessions derived with client credentials using its own reference to the SASL library.
The mongod server passes the auth payload to the SASL library which hands it over to the saslauthd daemon. The daemon passes it to the LDAP and awaits a YES or NO response upon the authentication request by checking if the user exists and the submitted password is correct.
The saslauthd passes this response to the mongod server through the SASL library which then authenticates or rejects the request accordingly.

Here is an illustration for this process:

To add an external user to a mongod server:

> db.getSiblingDB("$external").createUser( {user : username, roles: [ {role: "read", db: "test"} ]} );

External users however cannot have roles assigned in the admin database.

HashiCorp Vault Integration

HashCorp Vault is a product designed to manage secrets and protect sensitive data by securely storing and tightly controlling access to confidential information. With the previous Percona version, data at rest encryption key was stored locally on the server inside the key file. The integration with HashCorp Vault secures the encryption key much far better.

Enhanced Query Profiling

Profiling has a degradation impact on the database performance especially when there are so many queries issued. Percona server for MongoDB comes in hand by limiting the number of queries collected by the database profiler hence decreases its impact on performance.

Conclusion

Percona Server for MongoDB is an enhanced open source and highly scalable database that may act as a compatible drop-in replacement for MongoDB Community Edition but with similar syntax and configuration. It enhances extensive data security especially one at rest and improved database performance through provision of Percona Server engine, limiting on the profiling rate among other features.

Percona Server for MongoDB is fully supported by ClusterControl as an option for deployment.

Tags:

percona server for mongodb

PostgreSQL Streaming Replication is a great way of scaling PostgreSQL clusters and doing it adds high availability to them. As with every replication, the idea is that the slave is a copy of the master and that the slave is constantly updated with the changes that happened on the master using some sort of a replication mechanism.

It may happen that the slave, for some reason, gets out of sync with the master. How can I bring it back to the replication chain? How can I ensure that the slave is again in-sync with the master? Let’s take a look in this short blog post.

What is very helpful, there is no way to write on a slave if it is in the recovery mode. You can test it like that:

postgres=# SELECT pg_is_in_recovery();

 pg_is_in_recovery

-------------------

 t

(1 row)

postgres=# CREATE DATABASE mydb;

ERROR:  cannot execute CREATE DATABASE in a read-only transaction

It still may happen that the slave would go out of sync with the master. Data corruption - neither hardware or software is without bugs and issues. Some problems with the disk drive may trigger data corruption on the slave. Some problems with the “vacuum” process may result in data being altered. How to recover from that state?

Rebuilding the Slave Using pg_basebackup

The main step is to provision the slave using the data from the master. Given that we’ll be using streaming replication, we cannot use logical backup. Luckily there’s a ready tool that can be used to set things up: pg_basebackup. Let’s see what would be the steps we need to take to provision a slave server. To make it clear, we are using PostgreSQL 12 for the purpose of this blog post.

The initial state is simple. Our slave is not replicating from its master. Data it contains is corrupted and can’t be used nor trusted. Therefore the first step we’ll do will be to stop PostgreSQL on our slave and remove the data it contains:

root@vagrant:~# systemctl stop postgresql

Or even:

root@vagrant:~# killall -9 postgres

Now, let’s check the contents of the postgresql.auto.conf file, we can use replication credentials stored in that file later, for pg_basebackup:

root@vagrant:~# cat /var/lib/postgresql/12/main/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

promote_trigger_file='/tmp/failover_5432.trigger'

recovery_target_timeline=latest

primary_conninfo='application_name=pgsql_0_node_1 host=10.0.0.126 port=5432 user=cmon_replication password=qZnVoV7LV97CFX9F'

We are interested in the user and password used for setting up the replication.

Finally we are ok to remove the data:

root@vagrant:~# rm -rf /var/lib/postgresql/12/main/*

Once the data is removed, we need to use pg_basebackup to get the data from the master:

root@vagrant:~# pg_basebackup -h 10.0.0.126 -U cmon_replication -Xs -P -R -D /var/lib/postgresql/12/main/

Password:

waiting for checkpoint

The flags that we used have following meaning:

-Xs: we would like to stream WAL while the backup is created. This helps avoid problems with removing WAL files when you have a large dataset.
-P: we would like to see progress of the backup.
-R: we want pg_basebackup to create standby.signal file and prepare postgresql.auto.conf file with connection settings.

pg_basebackup will wait for the checkpoint before starting the backup. If it takes too long, you can use two options. First, it is possible to set checkpoint mode to fast in pg_basebackup using ‘-c fast’ option. Alternatively, you can force checkpointing by executing:

postgres=# CHECKPOINT;

CHECKPOINT

One way or the other, pg_basebackup will start. With the -P flag we can track the progress:

 416906/1588478 kB (26%), 0/1 tablespaceceace

Once the backup is ready, all we have to do is to make sure data directory content has the correct user and group assigned - we executed pg_basebackup as ‘root’ therefore we want to change it to ‘postgres’:

root@vagrant:~# chown -R postgres.postgres /var/lib/postgresql/12/main/

That’s all, we can start the slave and it should start to replicate from the master.

root@vagrant:~# systemctl start postgresql

You can double-check the replication progress by executing following query on the master:

postgres=# SELECT * FROM pg_stat_replication;

  pid  | usesysid |     usename | application_name | client_addr | client_hostname | client_port |         backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time

-------+----------+------------------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------

 23565 |    16385 | cmon_replication | pgsql_0_node_1   | 10.0.0.128 | | 51554 | 2020-02-27 15:25:00.002734+00 |              | streaming | 2/AA5EF370 | 2/AA5EF2B0 | 2/AA5EF2B0 | 2/AA5EF2B0 | | |         | 0 | async | 2020-02-28 13:45:32.594213+00

 11914 |    16385 | cmon_replication | 12/main          | 10.0.0.127 | | 25058 | 2020-02-28 13:42:09.160576+00 |              | streaming | 2/AA5EF370 | 2/AA5EF2B0 | 2/AA5EF2B0 | 2/AA5EF2B0 | | |         | 0 | async | 2020-02-28 13:45:42.41722+00

(2 rows)

As you can see, both slaves are replicating correctly.

Rebuilding the Slave Using ClusterControl

If you are a ClusterControl user you can easily achieve exactly the same just by picking an option from the UI.

The initial situation is that one of the slaves (10.0.0.127) is not working and it is not replicating. We deemed that the rebuild is the best option for us.

As ClusterControl users all we have to do is to go to the “Nodes” tab and run “Rebuild Replication Slave” job.

Next, we have to pick the node to rebuild slave from and that is all. ClusterControl will use pg_basebackup to set up the replication slave and configure the replication as soon as the data is transferred.

After some time job completes and the slave is back in the replication chain:

As you can see, with just a couple of clicks, thanks to ClusterControl, we managed to rebuild our failed slave and bring it back to the cluster.

Tags:

PostgreSQL

postgres

slave

database troubleshooting

troubleshooting

If you have a PostgreSQL cluster up-and-running, and you need to handle data that changes with time (like metrics collected from a system) you should consider using a time-series database that is designed to store this kind of data.

TimescaleDB is an open-source time-series database optimized for fast ingest and complex queries that supports full SQL. It is based on PostgreSQL and it offers the best of NoSQL and Relational worlds for Time-series data.

In this blog, we will see how to manually enable TimescaleDB in an existing PostgreSQL database and how to do the same task using ClusterControl.

Enabling TimescaleDB Manually

For this blog, we will use CentOS 7 as the operating system and PostgreSQL 11 as the database server.

By default, you don’t have TimescaleDB enabled for PostgreSQL:

world=# \dx

                 List of installed extensions

  Name   | Version |   Schema |     Description

---------+---------+------------+------------------------------

 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language

(1 row)

So first, you need to add the corresponding repository to install the software:

$ cat /etc/yum.repos.d/timescale_timescaledb.repo

[timescale_timescaledb]

name=timescale_timescaledb

baseurl=https://packagecloud.io/timescale/timescaledb/el/7/\$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://packagecloud.io/timescale/timescaledb/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

metadata_expire=300

We will assume you have the PostgreSQL repository in place as this TimescaleDB installation will require dependencies from there.

Next step is to install the package:

$ yum install timescaledb-postgresql-11

And configure it in your current PostgreSQL database. For this, edit your postgresql.conf file and add 'timescaledb' in the shared_preload_libraries parameter:

shared_preload_libraries = 'timescaledb'

Or if you already have something added there:

shared_preload_libraries = 'pg_stat_statements,timescaledb'

You can also configure the max_background_workers for TimescaleDB to specify the max number of background workers.

timescaledb.max_background_workers=4

Keep in mind that this change requires a database service restart:

$ service postgresql-11 restart

And then, you will have your TimescaleDB installed:

postgres=# SELECT * FROM pg_available_extensions WHERE name='timescaledb';

    name     | default_version | installed_version |                              comment



-------------+-----------------+-------------------+-----------------------------------------------

--------------------

 timescaledb | 1.6.0           | | Enables scalable inserts and complex queries f

or time-series data

(1 row)

So now, you need to enable it:

$ psql world

world=# CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

WARNING:

WELCOME TO

 _____ _                               _ ____________

|_   _(_)                             | | | _ \ ___ \

  | |  _ _ __ ___   ___ ___ ___ __ _| | ___| | | | |_/ /

  | | | |  _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \

  | | | | | | | | |  __/\__ \ (_| (_| | |  __/ |/ /| |_/ /

  |_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/

               Running version 1.6.0

For more information on TimescaleDB, please visit the following links:



 1. Getting started: https://docs.timescale.com/getting-started

 2. API reference documentation: https://docs.timescale.com/api

 3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture



Note: TimescaleDB collects anonymous reports to better understand and assist our users.

For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.



CREATE EXTENSION

Done.

world=# \dx

                                      List of installed extensions

    Name     | Version |   Schema |                         Description



-------------+---------+------------+--------------------------------------------------------------

-----

 plpgsql     | 1.0 | pg_catalog | PL/pgSQL procedural language

 timescaledb | 1.6.0   | public | Enables scalable inserts and complex queries for time-series

data

(2 rows)

Now, let’s see how to enable it using ClusterControl.

Using ClusterControl to Enable TimescaleDB

We will assume you have your PostgreSQL cluster imported in ClusterControl or even deployed using it.

To enable TimescaleDB using ClusterControl, you just need to go to your PostgreSQL Cluster Actions and press on the “Enable TimescaleDB” option.

You will receive a warning about the database restart. Confirm it.

You can monitor the task in the ClusterControl Activity section.

Then you will have your TimescaleDB ready to use.

Conclusion

Now you have your TimescaleDB up and running, you can handle your time-series data in a more performant way. For this, you can create new tables or even migrate your current data, and of course, you should know how to use it to take advantage of this new concept.

Tags:

MariaDB has introduced a very cool feature called Flashback. Flashback is a feature that will allow instances, databases or tables to be rolled back to an old snapshot. Traditionally, to perform a point-in-time recovery (PITR), one would restore a database from a backup, and replay the binary logs to roll forward the database state at a certain time or position.

With Flashback, the database can be rolled back to a point of time in the past, which is way faster if we just want to see the past that just happened not a long time ago. Occasionally, using flashback might be inefficient if you want to see a very old snapshot of your data relative to the current date and time. Restoring from a delayed slave, or from a backup plus replaying the binary log might be the better options.

This feature is only available in the MariaDB client package, but that doesn't mean we can not use it with our MySQL servers. This blog post showcases how we can use this amazing feature on a MySQL server.

MariaDB Flashback Requirements

For those who want to use MariaDB flashback feature on top of MySQL, we can basically do the following:

Enable binary log with the following setting:
1. binlog_format = ROW (default since MySQL 5.7.7).
2. binlog_row_image = FULL (default since MySQL 5.6).
Use msqlbinlog utility from any MariaDB 10.2.4 and later installation.
Flashback is currently supported only over DML statements (INSERT, DELETE, UPDATE). An upcoming version of MariaDB will add support for flashback over DDL statements (DROP, TRUNCATE, ALTER, etc.) by copying or moving the current table to a reserved and hidden database, and then copying or moving back when using flashback.

The flashback is achieved by taking advantage of existing support for full image format binary logs, thus it supports all storage engines. Note that the flashback events will be stored in memory. Therefore, you should make sure your server has enough memory for this feature.

How Does MariaDB Flashback Work?

MariaDB's mysqlbinlog utility comes with two extra options for this purpose:

-B, --flashback - Flashback feature can rollback your committed data to a special time point.
-T, --table=[name] - List entries for just this table (local log only).

By comparing the mysqlbinlog output with and without the --flashback flag, we can easily understand how it works. Consider the following statement is executed on a MariaDB server:

MariaDB> DELETE FROM sbtest.sbtest1 WHERE id = 1;

Without flashback flag, we will see the actual DELETE binlog event:

$ mysqlbinlog -vv \

--start-datetime="$(date '+%F %T' -d 'now - 10 minutes')" \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000003

...

# at 453196541

#200227 12:58:18 server id 37001  end_log_pos 453196766 CRC32 0xdaa248ed Delete_rows: table id 238 flags: STMT_END_F



BINLOG '

6rxXXhOJkAAAQwAAAP06AxsAAO4AAAAAAAEABnNidGVzdAAHc2J0ZXN0MQAEAwP+/gTu4P7wAAEB

AAID/P8AFuAQfA==

6rxXXiCJkAAA4QAAAN47AxsAAO4AAAAAAAEAAgAE/wABAAAAVJ4HAHcAODM4Njg2NDE5MTItMjg3

NzM5NzI4MzctNjA3MzYxMjA0ODYtNzUxNjI2NTk5MDYtMjc1NjM1MjY0OTQtMjAzODE4ODc0MDQt

NDE1NzY0MjIyNDEtOTM0MjY3OTM5NjQtNTY0MDUwNjUxMDItMzM1MTg0MzIzMzA7Njc4NDc5Njcz

NzctNDgwMDA5NjMzMjItNjI2MDQ3ODUzMDEtOTE0MTU0OTE4OTgtOTY5MjY1MjAyOTHtSKLa

'/*!*/;

### DELETE FROM `sbtest`.`sbtest1`

### WHERE

###   @1=1 /* INT meta=0 nullable=0 is_null=0 */

###   @2=499284 /* INT meta=0 nullable=0 is_null=0 */

###   @3='83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330' /* STRING(480) meta=61152 nullable=0 is_null=0 */

###   @4='67847967377-48000963322-62604785301-91415491898-96926520291' /* STRING(240) meta=65264 nullable=0 is_null=0 */

...

By extending the above mysqlbinlog command with --flashback, we can see the DELETE event is converted to an INSERT event and similarly to the respective WHERE and SET clauses:

$ mysqlbinlog -vv \

--start-datetime="$(date '+%F %T' -d 'now - 10 minutes')" \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000003 \

--flashback

...

BINLOG '

6rxXXhOJkAAAQwAAAP06AxsAAO4AAAAAAAEABnNidGVzdAAHc2J0ZXN0MQAEAwP+/gTu4P7wAAEB

AAID/P8AFuAQfA==

6rxXXh6JkAAA4QAAAN47AxsAAO4AAAAAAAEAAgAE/wABAAAAVJ4HAHcAODM4Njg2NDE5MTItMjg3

NzM5NzI4MzctNjA3MzYxMjA0ODYtNzUxNjI2NTk5MDYtMjc1NjM1MjY0OTQtMjAzODE4ODc0MDQt

NDE1NzY0MjIyNDEtOTM0MjY3OTM5NjQtNTY0MDUwNjUxMDItMzM1MTg0MzIzMzA7Njc4NDc5Njcz

NzctNDgwMDA5NjMzMjItNjI2MDQ3ODUzMDEtOTE0MTU0OTE4OTgtOTY5MjY1MjAyOTHtSKLa

'/*!*/;

### INSERT INTO `sbtest`.`sbtest1`

### SET

###   @1=1 /* INT meta=0 nullable=0 is_null=0 */

###   @2=499284 /* INT meta=0 nullable=0 is_null=0 */

###   @3='83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330' /* STRING(480) meta=61152 nullable=0 is_null=0 */

###   @4='67847967377-48000963322-62604785301-91415491898-96926520291' /* STRING(240) meta=65264 nullable=0 is_null=0 */

...

In row-based replication (binlog_format=ROW), each row change event contains two images, a “before” image (except INSERT) whose columns are matched against when searching for the row to be updated, and an “after” image (except DELETE) containing the changes. With binlog_row_image = FULL, MariaDB logs full rows (that is, all columns) for both the before and after images.

The following example shows binary log events for UPDATE. Consider the following statement is executed on a MariaDB server:

MariaDB> UPDATE sbtest.sbtest1 SET k = 0 WHERE id = 5;

When looking at the binlog event for the above statement, we will see something like this:

$ mysqlbinlog -vv \

--start-datetime="$(date '+%F %T' -d 'now - 5 minutes')" \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000001 

...

### UPDATE `sbtest`.`sbtest1`

### WHERE

###   @1=5 /* INT meta=0 nullable=0 is_null=0 */

###   @2=499813 /* INT meta=0 nullable=0 is_null=0 */

###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */

###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */

### SET

###   @1=5 /* INT meta=0 nullable=0 is_null=0 */

###   @2=0 /* INT meta=0 nullable=0 is_null=0 */

###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */

###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */

# Number of rows: 1

...

With the --flashback flag, the "before" image is swapped with the "after" image of the existing row:

$ mysqlbinlog -vv \

--start-datetime="$(date '+%F %T' -d 'now - 5 minutes')" \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000001 \

 --flashback

...

### UPDATE `sbtest`.`sbtest1`

### WHERE

###   @1=5 /* INT meta=0 nullable=0 is_null=0 */

###   @2=0 /* INT meta=0 nullable=0 is_null=0 */

###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */

###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */

### SET

###   @1=5 /* INT meta=0 nullable=0 is_null=0 */

###   @2=499813 /* INT meta=0 nullable=0 is_null=0 */

###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */

###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */

...

We can then redirect the flashback output to the MySQL client, thus rolling back the database or table to the point of time that we want. More examples are shown in the next sections.

MariaDB has a dedicated knowledge base page for this feature. Check out MariaDB Flashback knowledge base page.

MariaDB Flashback With MySQL

To have the flashback ability for MySQL, one has to do the following:

Copy the mysqlbinlog utility from any MariaDB server (10.2.4 or later).
Disable MySQL GTID before applying the flashback SQL file. Global variables gtid_mode and enforce_gtid_consistency can be set in runtime since MySQL 5.7.5.

Suppose we are having the following simple MySQL 8.0 replication topology:

In this example, we copied mysqlbinlog utility from the latest MariaDB 10.4 on one of our MySQL 8.0 slave (slave2):

(mariadb-server)$ scp /bin/mysqlbinlog root@slave2-mysql:/root/

(slave2-mysql8)$ ls -l /root/mysqlbinlog

-rwxr-xr-x. 1 root root 4259504 Feb 27 13:44 /root/mysqlbinlog

Our MariaDB's mysqlbinlog utility is now located at /root/mysqlbinlog on slave2. On the MySQL master, we executed the following disastrous statement:

mysql> DELETE FROM sbtest1 WHERE id BETWEEN 5 AND 100;

Query OK, 96 rows affected (0.01 sec)

96 rows were deleted in the above statement. Wait a couple of seconds to let the events replicate from master to all slaves before we can try to find the binlog position of the disastrous event on the slave server. The first step is to retrieve all the binary logs on that server:

mysql> SHOW BINARY LOGS;

+---------------+-----------+-----------+

| Log_name      | File_size | Encrypted |

+---------------+-----------+-----------+

| binlog.000001 |       850 | No |

| binlog.000002 |     18796 | No |

+---------------+-----------+-----------+

Our disastrous event should exist inside binlog.000002, the latest binary log in this server. We can then use the MariaDB's mysqlbinlog utility to retrieve all binlog events for table sbtest1 since 10 minutes ago:

(slave2-mysql8)$ /root/mysqlbinlog -vv \

--start-datetime="$(date '+%F %T' -d 'now - 10 minutes')" \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000002

...

# at 195

#200228 15:09:45 server id 37001  end_log_pos 281 CRC32 0x99547474 Ignorable

# Ignorable event type 33 (MySQL Gtid)

# at 281

#200228 15:09:45 server id 37001  end_log_pos 353 CRC32 0x8b12bd3c Query thread_id=19 exec_time=0 error_code=0

SET TIMESTAMP=1582902585/*!*/;

SET @@session.pseudo_thread_id=19/*!*/;

SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1, @@session.check_constraint_che

cks=1/*!*/;

SET @@session.sql_mode=524288/*!*/;

SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;

SET @@session.character_set_client=255,@@session.collation_connection=255,@@session.collation_server=255/*!*/;

SET @@session.lc_time_names=0/*!*/;

SET @@session.collation_database=DEFAULT/*!*/;

BEGIN

/*!*/;

# at 353

#200228 15:09:45 server id 37001  end_log_pos 420 CRC32 0xe0e44a1b Table_map: `sbtest`.`sbtest1` mapped to number 92

# at 420

# at 8625

# at 16830

#200228 15:09:45 server id 37001  end_log_pos 8625 CRC32 0x99b1a8fc Delete_rows: table id 92

#200228 15:09:45 server id 37001  end_log_pos 16830 CRC32 0x89496a07 Delete_rows: table id 92

#200228 15:09:45 server id 37001  end_log_pos 18765 CRC32 0x302413b2 Delete_rows: table id 92 flags: STMT_END_F

To easily look up for the binlog position number, pay attention on the lines that start with "# at ". From the above lines, we can see the DELETE event was happening at position 281 inside binlog.000002 (starts at "# at 281"). We can also retrieve the binlog events directly inside a MySQL server:

mysql> SHOW BINLOG EVENTS IN 'binlog.000002';

+---------------+-------+----------------+-----------+-------------+-------------------------------------------------------------------+

| Log_name      | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |

+---------------+-------+----------------+-----------+-------------+-------------------------------------------------------------------+

| binlog.000002 |     4 | Format_desc |   37003 | 124 | Server ver: 8.0.19, Binlog ver: 4                                 |

| binlog.000002 |   124 | Previous_gtids |     37003 | 195 | 0d98d975-59f8-11ea-bd30-525400261060:1                            |

| binlog.000002 |   195 | Gtid |     37001 | 281 | SET @@SESSION.GTID_NEXT= '0d98d975-59f8-11ea-bd30-525400261060:2' |

| binlog.000002 |   281 | Query |     37001 | 353 | BEGIN                                                             |

| binlog.000002 |   353 | Table_map |     37001 | 420 | table_id: 92 (sbtest.sbtest1)                                     |

| binlog.000002 |   420 | Delete_rows |     37001 | 8625 | table_id: 92                                                      |

| binlog.000002 |  8625 | Delete_rows   | 37001 | 16830 | table_id: 92                                                      |

| binlog.000002 | 16830 | Delete_rows    | 37001 | 18765 | table_id: 92 flags: STMT_END_F                                    |

| binlog.000002 | 18765 | Xid            | 37001 | 18796 | COMMIT /* xid=171006 */                                           |

+---------------+-------+----------------+-----------+-------------+-------------------------------------------------------------------+

9 rows in set (0.00 sec)

We can now confirm that position 281 is where we want our data to revert to. We can then use the --start-position flag to generate accurate flashback events. Notice that we omit the "-vv" flag and add the --flashback flag:

(slave2-mysql8)$ /root/mysqlbinlog \

--start-position=281 \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000002 \

--flashback > /root/flashback.binlog

The flashback.binlog contains all the required events to undo all changes happened on table sbtest1 on this MySQL server. Since this is a slave node of a replication cluster, we have to break the replication on the chosen slave (slave2) in order to use it for flashback purposes. To do this, we have to stop the replication on the chosen slave, set MySQL GTID to ON_PERMISSIVE and make the slave writable:

(slave2-mysql)$ mysql -uroot -p

mysql> STOP SLAVE; 

SET GLOBAL gtid_mode = ON_PERMISSIVE; 

SET GLOBAL enforce_gtid_consistency = OFF; 

SET GLOBAL read_only = OFF;

At this point, slave2 is not part of the replication and our topology is looking like this:

Import the flashback via mysql client and we do not want this change to be recorded in MySQL binary log:

(slave2-mysql8)$ mysql -uroot -p --init-command='SET sql_log_bin=0' sbtest < /root/flashback.binlog

We can then see all the deleted rows, as proven by the following statement:

mysql> SELECT COUNT(id) FROM sbtest1 WHERE id BETWEEN 5 and 100;

+-----------+

| COUNT(id) |

+-----------+

|        96 |

+-----------+

1 row in set (0.00 sec)

We can then create an SQL dump file for table sbtest1 for our reference:

(slave2-mysql8)$ mysqldump -uroot -p --single-transaction sbtest sbtest1 > sbtest1_flashbacked.sql

Once the flashback operation completes, we can rejoin the slave node back into the replication chain. But firstly, we have to bring back the database into a consistent state, by replaying all events starting from the position we had flashbacked. Don't forget to skip binary logging as we do not want to "write" onto the slave and risking ourselves with errant transactions:

(slave2-mysql8)$ /root/mysqlbinlog \

--start-position=281 \

--database=sbtest \

--table=sbtest1 \

/var/lib/mysql/binlog.000001 | mysql -uroot -p --init-command='SET sql_log_bin=0' sbtest

Finally, prepare the node back to its role as MySQL slave and start the replication:

mysql> SET GLOBAL read_only = ON;

SET GLOBAL enforce_gtid_consistency = ON; 

SET GLOBAL gtid_mode = ON; 

START SLAVE;

Verify that the slave node is replicating correctly:

mysql> SHOW SLAVE STATUS\G

...

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

...

At this point, we have re-joined the slave back into the replication chain and our topology is now back to its original state:

Shout out to the MariaDB team for introducing this astounding feature!

Tags:

A single point of failure (SPOF) is a common reason why organizations are working towards distributing the presence of their database environments to another location geographically. It's part of the Disaster Recovery and Business Continuity strategic plans.

Disaster Recovery (DR) planning embodies technical procedures which cover the preparation for unanticipated issues such as natural disasters, accidents (such as human error), or incidents (such as criminal acts).

For the past decade, distributing your database environment across multiple geographical locations has been a pretty common setup, as public clouds offer a lot of ways to deal with this. The challenge comes in setting up database environments. It creates challenges when you try to manage the database(s), move your data to another geo-location, or apply security with a high level of observability.

In this blog, we'll showcase how you can do this using MySQL Replication. We'll cover how you are able to copy your data to another database node located in a different country distant from the current geography of the MySQL cluster. For this example, our target region is based on us-east, while my on-prem is in Asia located in the Philippines.

Why Do I Need A Geo-Location Database Cluster?

Even Amazon AWS, the top public cloud provider, claims they suffer from downtime or unintended outages (like the one that happened in 2017). Let's say you are using AWS as your secondary datacenter aside from your on-prem. You cannot have any internal access to its underlying hardware or to those internal networks that are managing your compute nodes. These are fully managed services which you paid for, but you cannot avoid the fact that it can suffer from an outage anytime. If such a geographic location suffers an outage then you can have a long downtime.

This type of problem must be foreseen during your business continuity planning. It must have been analyzed and implemented based on what has been defined. Business continuity for your MySQL databases should include high uptime. Some environments are doing benchmarks and set a high bar of rigorous tests including the weak side in order to expose any vulnerability, how resilient it can be, and how scalable your technology architecture including your database infrastructure. For business especially those handling high transactions, it is imperative to ensure that production databases are available for the applications all the time even when catastrophe occurs. Otherwise, downtime can be experienced and it might cost you a large amount of money.

With these identified scenarios, organizations start extending their infrastructure to different cloud providers and putting nodes to different geo-location to have more high uptime (if possible at 99.99999999999), lower RPO, and has no SPOF.

To ensure production databases survive a disaster, a Disaster Recovery (DR) site must be configured. Production and DR sites must be part of two geographically distant datacenters. This means, a standby database must be configured at the DR site for every production database so that, the data changes occurring on production database are immediately synced across to the standby database via transaction logs. Some setups also use their DR nodes to handle reads so as to provide load balancing between application and the data layer.

The Desired Architectural Setup

In this blog, the desired setup is simple and yet very common implementation nowadays. See below on the desired architectural setup for this blog:

In this blog, I choose Google Cloud Platform (GCP) as the public cloud provider, and using my local network as my on-prem database environment.

It is a must that when using this type of design, you always need both environment or platform to communicate in a very secure manner. Using VPN or using alternatives such as AWS Direct Connect. Although these public clouds nowadays offer managed VPN services which you can use. But for this setup, we'll be using OpenVPN since I don't need sophisticated hardware or service for this blog.

Best and Most Efficient Way

For MySQL/Percona/MariaDB database environments, the best and efficient way is to take a backup copy of your database, send to the target node to be deployed or instantiated. There are different ways to use this approach either you can use mysqldump, mydumper, rsync, or use Percona XtraBackup/Mariabackup and stream the data going to your target node.

Using mysqldump

mysqldump creates a logical backup of your whole database or you can selectively choose a list of databases, tables, or even specific records that you wanted to dump.

A simple command that you can use to take a full backup can be,

$ mysqldump --single-transaction --all-databases --triggers --routines --events --master-data | mysql -h <target-host-db-node -u<user> -p<password> -vvv --show-warnings

With this simple command, it will directly run the MySQL statements to the target database node, for example your target database node on a Google Compute Engine. This can be efficient when data is smaller or you have a fast bandwidth. Otherwise, packing your database to a file then send it to the target node can be your option.

$ mysqldump --single-transaction --all-databases --triggers --routines --events --master-data | gzip > mydata.db

$ scp mydata.db <target-host>:/some/path

Then run mysqldump to the target database node as such,

zcat mydata.db | mysql

The downside with using logical backup using mysqldump is it's slower and consumes disk space. It also uses a single thread so you cannot run this in parallel. Optionally, you can use mydumper especially when your data is too huge. mydumper can be run in parallel but it's not as flexible compared to mysqldump.

Using xtrabackup

xtrabackup is a physical backup where you can send the streams or binary to the target node. This is very efficient and is mostly used when streaming a backup over the network especially when the target node is of different geography or different region. ClusterControl uses xtrabackup when provisioning or instantiating a new slave regardless where it is located as long as access and permission has been setup prior to the action.

If you are using xtrabackup to run it manually, you can run the command as such,

## Target node

$  socat -u tcp-listen:9999,reuseaddr stdout 2>/tmp/netcat.log | xbstream -x -C /var/lib/mysql

## Source node

$ innobackupex --defaults-file=/etc/my.cnf --stream=xbstream --socket=/var/lib/mysql/mysql.sock  --host=localhost --tmpdir=/tmp /tmp | socat -u stdio TCP:192.168.10.70:9999

To elaborate those two commands,the first command has to be executed or run first on the target node. The target node command does listen on port 9999 and will write any stream that is received from port 9999 in the target node. It is dependent on commands socat and xbstream which means you must ensure you have these packages installed.

On the source node, it executes the innobackupex perl script which invokes xtrabackup in the background and uses xbstream to stream the data that will be sent over the network. The socat command opens the port 9999 and sends its data to the desired host, which is 192.168.10.70 in this example. Still, ensure that you have socat and xbstream installed when using this command. Alternative way of using socat is nc but socat offers more advanced features compared to nc such as serialization like multiple clients can listen on a port.

ClusterControl uses this command when rebuilding a slave or building a new slave. It is fast and guarantees the exact copy of your source data will be copied to your target node. When provisioning a new database into a separate geo-location, using this approach offers more efficiency and offers you more speed to finish the job. Although there can be pros and cons when using logical or binary backup when streamed through the wire. Using this method is a very common approach when setting up a new geo-location database cluster to a different region and create an exact copy of your database environment.

Efficiency, Observability, and Speed

Questions left by most people who are not familiar with this approach always covers the "HOW, WHAT, WHERE" problems. In this section, we'll cover how you can efficiently setup your geo-location database with less work to deal with and with observability why it fails. Using ClusterControl is very efficient. In this current setup I have, the following environment as initially implemented:

Extending Node to GCP

Starting to setup your geo-Location database cluster, to extend your cluster and create a snapshot copy of your cluster, you can add a new slave. As mentioned earlier, ClusterControl will use xtrabackup (mariabackup for MariaDB 10.2 onwards) and deploy a new node within your cluster. Before you can register your GCP compute nodes as your target nodes, you need to setup first the appropriate system user the same as the system user you registered in ClusterControl. You can verify this in your /etc/cmon.d/cmon_X.cnf, where X is the cluster_id. For example, see below:

# grep 'ssh_user' /etc/cmon.d/cmon_27.cnf 

ssh_user=maximus

maximus(in this example) must be present in your GCP compute nodes. The user in your GCP nodes must have the sudo or super admin privileges. It must also be setup with a password-less SSH access. Please read our documentation more about the system user and it's required privileges.

Let's have an example list of servers below (from GCP console: Compute Engine dashboard):

In the screenshot above, our target region is based on the us-east region. As noted earlier, my local network is setup over a secure layer going through GCP (vice-versa) using OpenVPN. So communication from GCP going to my local network is also encapsulated over the VPN tunnel.

Add a Slave Node To GCP

The screenshot below reveals how you can do this. See images below:

As seen in the second screenshot, we're targeting node 10.142.0.12 and its source master is 192.168.70.10. ClusterControl is smart enough to determine firewalls, security modules, packages, configuration, and setup that needs to be done. See below an example of job activity log:

Quite a simple task, isn't it?

Complete The GCP MySQL Cluster

We need to add two nodes more to the GCP cluster to have a balance topology as we did have in the local network. For the second and third node, ensure that the master must be pointing to your GCP node. In this example, the master is 10.142.0.12. See below how to do this,

As seen in the screenshot above, I selected the 10.142.0.12 (slave) which is the first node we have added into the cluster. The complete result shows as follows,

Your Final Setup of Geo-Location Database Cluster

From the last screenshot, this kind of topology might not be your ideal setup. Mostly, it has to be a multi-master setup, where your DR cluster serves as the standby cluster, where as your on-prem serves as the primary active cluster. To do this, it's quite simple in ClusterControl. See the following screenshots to achieve this goal.

You can just drag your current master to the target master that has to be setup as a primary-standby writer just in case your on-prem in harm. In this example, we drag targeting host 10.142.0.12 (GCP compute node). The end result is shown below:

Then it achieves the desired result. Easy, and very quick to spawn your Geo-Location Database cluster using MySQL Replication.

Conclusion

Having a Geo-Location Database Cluster is not new. It has been a desired setup for companies and organizations avoiding SPOF who want resilience and a lower RPO.

The main takeaways for this setup are security, redundancy, and resilience. It also covers how feasible and efficient you can deploy your new cluster to a different geographic region. While ClusterControl can offer this, expect we can have more improvement on this sooner where you can create efficiently from a backup and spawn your new different cluster in ClusterControl, so stay tuned.

Tags:

Database security requires careful planning, but it is important to remember that security is not a state, it is a process. Once the database is in place, monitoring, alerting and reporting on changes are an integral part of the ongoing management. Also, security efforts need to be aligned with business needs.

Database vendors regularly issue critical patch updates to address software bugs or known vulnerabilities, but for a variety of reasons, organizations are often unable to install them in a timely manner, if at all. Evidence suggests that companies are actually getting worse at patching databases, with an increased number violating compliance standards and governance policies. Patching that requires database downtime would be of extreme concern in a 24/7 environment, however, most cluster upgrades can be performed online.

ClusterControl is able to perform a rolling upgrade of a distributed environment, upgrading and restarting one node at a time. The logical upgrade steps might slightly differ between the different cluster types. Load balancers would automatically blacklist unavailable nodes that are currently being upgraded, so that applications are not affected.

Operational Reporting on Version Upgrades and Patches is an area that requires constant attention, especially with the proliferation of open source databases in many organizations and more database environments being distributed for high availability.

ClusterControl provides a solid operational reporting framework and can help answer simple questions like

What versions of the software are running across the environment?
Which servers should be upgraded?
Which servers are missing critical updates?

Automatic Database Patching

ClusterControl provides the ability for automatic rolling upgrades for MySQL& MariaDB to ensure that your databases always use the latest patches and fixes.

Upgrades are online and are performed on one node at a time. The node will be stopped, then software will be updated, and then the node will be started again. If a node fails to upgrade, the upgrade process is aborted.

Rolling MySQL Database Upgrades

ClusterControl provides the ability for automatic rolling upgrades for MySQL-based database clusters by automatically applying the upgrade one node at a time which results in zero downtime.

After successfully installing the selected version you must perform a rolling restart - the nodes restart one by one.

ClusterControl supports you in that step making sure nodes are responding properly during the node restart.

Database Upgrade Assistance

ClusterControl makes it easy to upgrade your MongoDB and PostgreSQL databases by, with a simple click, promoting a slave or replica to allow you to upgrade the Master and vice versa.

Database Package Summary Operational Report

ClusterControl provides the Package Summary Operational Report that shows you how many technology and security patches are available to upgrade.

You can generate it ad-hoc and view in the UI, send it via email or you can schedule such a report to be delivered to you for example once per week.

As you can see, the Upgrade Report contains information about different hosts in the cluster, which database has been installed on them and in which version. It also contains information about how many other packages installed are not up to date. You can see the total number, how many are related to database services, how many are providing security updates and the rest of them.

The Upgrade Report lists all of the not-up-to-date packages on a per-host basis. In the screenshot above you can see that the node 10.0.3.10 has two MongoDB util packages not up to date (those are the 2 DB packages mentioned in the summary). Then there is a list of security packages and all other packages which are not up to date.

Conclusion

ClusterControl goes an extra mile to make sure you are covered regarding the security (and other) updates. As you have seen, it is very easy to know if your systems are up to date. ClusterControl can also assist in performing the upgrade of the database nodes.

Tags:

MySQL slaves may become inconsistent. You can try to avoid it, but it’s really hard. Setting super_read_only and using row-based replication can help a lot, but no matter what you do, it is still possible that your slave will become inconsistent.

What can be done to rebuild an inconsistent MySQL slave? In this blog post we’ll take a look at this problem.

First off, let’s discuss what has to happen in order to rebuild a slave. To bring a node into MySQL Replication, it has to be provisioned with data from one of the nodes in the replication topology. This data has to be consistent at the point in time when it was collected. You cannot take it on a table by table or schema by schema basis, because this will make the provisioned node inconsistent internally. Meaning some data would be older than some other part of the dataset.

In addition to data consistency, it should also be possible to collect information about the relationship between the data and the state of replication. You want to have either binary log position at which the collected data is consistent or Global Transaction ID of the transaction which was the last one executed on the node that is the source of the data.

This leads us to the following considerations. You can rebuild a slave using any backup tool as long as this tool can generate consistent backup and it includes replication coordinates for the point-in-time in which the backup is consistent. This allows us to pick from a couple of options.

Using Mysqldump to Rebuild an Inconsistent MySQL Slave

Mysqldump is the most basic tool that we have to achieve this. It allows us to create a logical backup in, among others, the form of SQL statements. What is important, while being basic, it still allows us to take a consistent backup: it can use transaction to ensure that the data is consistent at the beginning of the transaction. It can also write down replication coordinates for that point, even a whole CHANGE MASTER statement, making it easy to start the replication using the backup.

Using Mydumper to Rebuild an Inconsistent MySQL Slave

Another option is to use mydumper - this tool, just like mysqldump, generates a logical backup and, just like mysqldump, can be used to create a consistent backup of the database. The main difference between mydumper and mysqldump is that mydumper, when paired with myloader, can dump and restore data in parallel, improving the dump and, especially, restore time.

Using a Snapshot to Rebuild an Inconsistent MySQL Slave

For those who use cloud providers, a possibility is to take a snapshot of the underlying block storage. Snapshots generate a point-in-time view of the data. This process is quite tricky though, as the consistency of the data and the ability to restore it depends mostly on the MySQL configuration.

You should ensure that the database works in a durable mode (it is configured in a way that crash of the MySQL will not result in any data loss). This is because (from a MySQL standpoint) taking a volume snapshot and then starting another MySQL instance off the data stored in it is, basically, the same process like if you would kill -9 the mysqld and then start it again. The InnoDB recovery has to happen, replay transactions that have been stored in binary logs, rollback transactions that haven’t completed before the crash and so on.

The downside of snapshot-based rebuild process is that it is strongly tied to the current vendor. You cannot easily copy the snapshot data from one cloud provider to another one. You may be able to move it between different regions but it will still be the same provider.

Using a Xtrabackup or Mariabackup to Rebuild an Inconsistent MySQL Slave

Finally, xtrabackup/mariabackup - this is a tool written by Percona and forked by MariaDB that allows to generate a physical backup. It is way faster than logical backups - it’s limited mostly by the hardware performance - disk or network being the most probable bottlenecks. Most of the workload is related to copying files from MySQL data directory to another location (on the same host or over the network).

While not nearly as fast as block storage snapshots, xtrabackup is way more flexible and can be used in any environment. The backup it produces consists of files, therefore it is perfectly possible to copy the backup to any location you like. Another cloud provider, your local datacenter, it doesn’t matter as long as you can transfer files from your current location.

It doesn’t even have to have network connectivity - you can as well just copy the backup to some “transferable” device like USB SSD or even USB stick, as long as it can contain all of the data and store it in your pocket while you relocate from one datacenter to another.

How to Rebuild a MySQL Slave Using Xtrabackup?

We decided to focus on xtrabackup, given its flexibility and ability to work in most of the environments where MySQL can exist. How do you rebuild your slave using xtrabackup? Let’s take a look.

Initially, we have a master and a slave, which suffered from some replication issues:

mysql> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.0.0.141

                  Master_User: rpl_user

                  Master_Port: 3306

                Connect_Retry: 10

              Master_Log_File: binlog.000004

          Read_Master_Log_Pos: 386

               Relay_Log_File: relay-bin.000008

                Relay_Log_Pos: 363

        Relay_Master_Log_File: binlog.000004

             Slave_IO_Running: Yes

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 1007

                   Last_Error: Error 'Can't create database 'mytest'; database exists' on query. Default database: 'mytest'. Query: 'create database mytest'

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 195

              Relay_Log_Space: 756

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 1007

               Last_SQL_Error: Error 'Can't create database 'mytest'; database exists' on query. Default database: 'mytest'. Query: 'create database mytest'

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1001

                  Master_UUID: 53d96192-53f7-11ea-9c3c-080027c5bc64

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State:

           Master_Retry_Count: 86400

                  Master_Bind:

      Last_IO_Error_Timestamp:

     Last_SQL_Error_Timestamp: 200306 11:47:42

               Master_SSL_Crl:

           Master_SSL_Crlpath:

           Retrieved_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:9

            Executed_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:1-8,

ce7d0c38-53f7-11ea-9f16-080027c5bc64:1-3

                Auto_Position: 1

         Replicate_Rewrite_DB:

                 Channel_Name:

           Master_TLS_Version:

       Master_public_key_path:

        Get_master_public_key: 0

            Network_Namespace:

1 row in set (0.00 sec)

As you can see, there is a problem with one of the schemas. Let’s assume we have to rebuild this node to bring it back into the replication. Here are the steps we have to perform.

First, we have to make sure xtrabackup is installed. In our case we use MySQL 8.0 therefore we have to use xtrabackup in version 8 to ensure compatibility:

root@master:~# apt install percona-xtrabackup-80

Reading package lists... Done

Building dependency tree

Reading state information... Done

percona-xtrabackup-80 is already the newest version (8.0.9-1.bionic).

0 upgraded, 0 newly installed, 0 to remove and 143 not upgraded.

Xtrabackup is provided by Percona repository and the guide to installing it can be found here:

https://www.percona.com/doc/percona-xtrabackup/8.0/installation/apt_repo.html

The tool has to be installed on both master and the slave that we want to rebuild.

As a next step we will remove all the data from the “broken” slave:

root@slave:~# service mysql stop

root@slave:~# rm -rf /var/lib/mysql/*

Next, we will take the backup on the master and stream it to the slave. Please keep in mind this particular one-liner requires passwordless SSH root connectivity from the master to the slave:

root@master:~# xtrabackup --backup --compress --stream=xbstream --target-dir=./ | ssh root@10.0.0.142 "xbstream -x --decompress -C /var/lib/mysql/"

At the end you should see an important line:

200306 12:10:40 completed OK!

This is an indicator that the backup completed OK. Couple of things may still go wrong but at least we got the data right. Next, on the slave, we have to prepare the backup.

root@slave:~# xtrabackup --prepare --target-dir=/var/lib/mysql/

.

.

.

200306 12:16:07 completed OK!

You should see, again, that the process completed OK. You may want now to copy the data back to the MySQL data directory. We don’t have to do that as we stored the streaming backup directly in /var/lib/mysql. What we want to do, though, is to ensure correct ownership of the files:

root@slave:~# chown -R mysql.mysql /var/lib/mysql

Now, let’s check the GTID coordinates of the backup. We will use them later when setting up the replication.

root@slave:~# cat /var/lib/mysql/xtrabackup_binlog_info

binlog.000007 195 53d96192-53f7-11ea-9c3c-080027c5bc64:1-9

Ok, all seems to be good, let’s start MySQL and proceed with configuring the replication:

root@slave:~# service mysql start

root@slave:~# mysql -ppass

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 8

Server version: 8.0.18-9 Percona Server (GPL), Release '9', Revision '53e606f'



Copyright (c) 2009-2019 Percona LLC and/or its affiliates

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.



Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



mysql>

Now we have to set the gtid_purged to the GTID set that we found in the backup. Those are GTID that have been “covered” by our backup. Only new GTID should replicate from the master.

mysql> SET GLOBAL gtid_purged='53d96192-53f7-11ea-9c3c-080027c5bc64:1-9';

Query OK, 0 rows affected (0.00 sec)

Now we can start the replication:

mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.141', MASTER_USER='rpl_user', MASTER_PASSWORD='yIPpgNE4KE', MASTER_AUTO_POSITION=1;

Query OK, 0 rows affected, 2 warnings (0.02 sec)



mysql> START SLAVE;

Query OK, 0 rows affected (0.00 sec)

mysql> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.0.0.141

                  Master_User: rpl_user

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: binlog.000007

          Read_Master_Log_Pos: 380

               Relay_Log_File: relay-bin.000002

                Relay_Log_Pos: 548

        Relay_Master_Log_File: binlog.000007

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 380

              Relay_Log_Space: 750

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1001

                  Master_UUID: 53d96192-53f7-11ea-9c3c-080027c5bc64

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

           Master_Retry_Count: 86400

                  Master_Bind:

      Last_IO_Error_Timestamp:

     Last_SQL_Error_Timestamp:

               Master_SSL_Crl:

           Master_SSL_Crlpath:

           Retrieved_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:10

            Executed_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:1-10

                Auto_Position: 1

         Replicate_Rewrite_DB:

                 Channel_Name:

           Master_TLS_Version:

       Master_public_key_path:

        Get_master_public_key: 0

            Network_Namespace:

1 row in set (0.00 sec)

As you can see, our slave is replicating from its master.

How to Rebuild a MySQL Slave Using ClusterControl?

If you are a ClusterControl user, instead of going through this process you can rebuild the slave in just a couple of clicks. Initially we have a clear issue with the replication:

Our slave is not replicating properly due to an error.

All we have to do is to run the “Rebuild Replication Slave” job.

You will be presented with a dialog where you should pick a master node for the slave that you want to rebuild. Then, click on Proceed and you are all set. ClusterControl will rebuild the slave and set up the replication for you.

Shortly, based on the data set size, you should see working slave:

As you can see, with just a couple of clicks ClusterControl accomplished the task of rebuilding the inconsistent replication slave.

Tags:

MySQL

mysql slave

slave

database troubleshooting

troubleshooting

recovery

One of the indicators to see if our database is experiencing performance issues is by looking at the CPU utilization. High CPU usage is directly proportional to disk I/O (where data is either read or written to the disk). In this blog we will take a close look at some of the parameters in the database and how they are related to the CPU core of the server.

How to Confirm if Your CPU Utilization is High

If you think you are experiencing high CPU utilization, you should check the process in the Operating System to determine what is causing the issue. To do this, you can utilize the htop / top command in the operating system.

[root@n1]# top

top - 14:10:35 up 39 days, 20:20,  2 users, load average: 0.30, 0.68, 1.13

Tasks: 461 total,   2 running, 459 sleeping,   0 stopped, 0 zombie

%Cpu(s):  0.7 us, 0.6 sy,  0.0 ni, 98.5 id, 0.1 wa,  0.0 hi, 0.0 si, 0.0 st

KiB Mem : 13145656+total,   398248 free, 12585008+used, 5208240 buff/cache

KiB Swap: 13421772+total, 11141702+free, 22800704 used.  4959276 avail Mem



  PID USER      PR NI VIRT    RES SHR S %CPU %MEM     TIME+ COMMAND

15799 mysql     20 0 145.6g 118.2g   6212 S 87.4 94.3 7184:35 mysqld

21362 root      20 0 36688 15788   2924 S 15.2 0.0 5805:21 node_exporter

From above top command, the highest CPU utilization came from mysqld daemon. You can check inside the database itself, what’s process is running :

mysql> show processlist;

+-----+----------+-------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------------------------------------------------------------+-----------+---------------+

| Id  | User     | Host         | db | Command          | Time | State                               | Info   | Rows_sent | Rows_examined |

+-----+----------+-------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------------------------------------------------------------+-----------+---------------+

|  32 | rpl_user | 10.10.10.18:45338 | NULL               | Binlog Dump GTID | 4134 | Master has sent all binlog to slave; waiting for more updates | NULL                                                                   | 0 | 0 |

|  47 | cmon     | 10.10.10.10:37214 | information_schema | Sleep            | 1 |           | NULL   | 492 | 984 |

|  50 | cmon     | 10.10.10.10:37390 | information_schema | Sleep            | 0 |           | NULL   | 0 | 0 |

|  52 | cmon     | 10.10.10.10:37502 | information_schema | Sleep            | 2 |           | NULL   | 1 | 599 |

| 429 | root     | localhost   | item | Query            | 4 | Creating sort index                               | select * from item where i_data like '%wjf%' order by i_data   | 0 | 0 |

| 443 | root     | localhost   | item | Query            | 2 | Creating sort index                               | select * from item where i_name like 'item-168%' order by i_price desc |         0 | 0 |

| 471 | root     | localhost   | NULL | Query            | 0 | starting                               | show processlist   | 0 | 0 |

+-----+----------+-------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------------------------------------------------------------+-----------+---------------+

7 rows in set (0.00 sec)

There are some running queries (as shown above) and it’s the SELECT query to the table item with state Creating sort index. The meaning of the creating sort index state is where the database is figuring out the order of returned values based on the order clause, the constraint with availability of CPU speed. So if you have limited CPU cores, this will impact the query.

Enabling a slow query log is beneficial to check if there’s an issue from the application side. You can enable the slow query by setting two parameters in the database, which is slow_query_log and long_query_time. Parameter slow_query_log must be set to ON to enable the slow query log, and long_query_time parameter is used for the threshold alert of long running query. There is also one parameter related to location of slow query file, slow_query_log_file can be set to any path to store the slow query log file.

mysql> explain select * from item where i_data like '%wjf%' order by i_data;

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

|  1 | SIMPLE      | item | NULL     | ALL | NULL   | NULL | NULL | NULL | 9758658 |    11.11 | Using where; Using filesort |

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

1 row in set, 1 warning (0.00 sec)

The long running queries (as shown above) were using a full table scan as the access method, you can see by checking the value of the column type, in this case it’s ALL. It means it will scan all the rows in the table to produce the data. The number of rows examined by the query is also quite large, around 9758658 rows, of course it will take more CPU time to spend on the query.

How ClusterControl Dashboards Identify is Running CPU High

ClusterControl helps with your daily routine and tasks and let’s you know if there’s something happening in your system, such as a high CPU. You can check the dashboard directly regarding CPU utilization.

As you can see on the screenshot above, we can clearly tell that CPU utilisation is quite high and there are significant spikes even up to 87%. The graph do not tell more, but you can always check the ‘top’ command in the shell or you can dig up what are currently running processes in the server from Top dashboard, as shown below :

If you see from the top dashboard, most of the CPU resources are taken by the mysqld daemon process. This mysqld process is the suspect who consumes a lot of CPU resources. You can dig inside the database more to check what is running, you also can check the running queries and top queries in the database through dashboard as shown below

Running Queries Dashboard

From the Running Queries dashboard, you can see the query executed, time spent of the query, state of the query itself. User and db info is related to the user who executed the query in which database.

Top Queries Dashboard

From the Top Queries dashboard, we can see the query that is causing the problem. The query scans 10614051 rows in the item table. The average execution time of the query is around 12.4 seconds, which is quite long for the query.

Conclusion

Troubleshooting CPU high is not so difficult, you just need to know what is currently running in the database server, with the help of the right tool, you can fix the problem immediately.

Tags:

cpu

database performance

query monitoring

database troubleshooting

troubleshooting

One of the most popular InnoDB's errors is InnoDB lock wait timeout exceeded, for example:

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

The above simply means the transaction has reached the innodb_lock_wait_timeout while waiting to obtain an exclusive lock which defaults to 50 seconds. The common causes are:

The offensive transaction is not fast enough to commit or rollback the transaction within innodb_lock_wait_timeout duration.
The offensive transaction is waiting for row lock to be released by another transaction.

The Effects of a InnoDB Lock Wait Timeout

InnoDB lock wait timeout can cause two major implications:

The failed statement is not being rolled back by default.
Even if innodb_rollback_on_timeout is enabled, when a statement fails in a transaction, ROLLBACK is still a more expensive operation than COMMIT.

Let's play around with a simple example to better understand the effect. Consider the following two tables in database mydb:

mysql> CREATE SCHEMA mydb;

mysql> USE mydb;

The first table (table1):

mysql> CREATE TABLE table1 ( id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(50));

mysql> INSERT INTO table1 SET data = 'data #1';

The second table (table2):

mysql> CREATE TABLE table2 LIKE table1;

mysql> INSERT INTO table2 SET data = 'data #2';

We executed our transactions in two different sessions in the following order:

Ordering	Transaction #1 (T1)	Transaction #2 (T2)
1	SELECT * FROM table1; (OK)	SELECT * FROM table1; (OK)
2	UPDATE table1 SET data = 'T1 is updating the row' WHERE id = 1; (OK)
3		UPDATE table2 SET data = 'T2 is updating the row' WHERE id = 1; (OK)
4		UPDATE table1 SET data = 'T2 is updating the row' WHERE id = 1; (Hangs for a while and eventually returns an error "Lock wait timeout exceeded; try restarting transaction")
5	COMMIT; (OK)
6		COMMIT; (OK)

However, the end result after step #6 might be surprising if we did not retry the timed out statement at step #4:

mysql> SELECT * FROM table1 WHERE id = 1;

+----+-----------------------------------+

| id | data                              |

+----+-----------------------------------+

| 1  | T1 is updating the row            |

+----+-----------------------------------+



mysql> SELECT * FROM table2 WHERE id = 1;

+----+-----------------------------------+

| id | data                              |

+----+-----------------------------------+

| 1  | T2 is updating the row            |

+----+-----------------------------------+

After T2 was successfully committed, one would expect to get the same output "T2 is updating the row" for both table1 and table2 but the results show that only table2 was updated. One might think that if any error encounters within a transaction, all statements in the transaction would automatically get rolled back, or if a transaction is successfully committed, the whole statements were executed atomically. This is true for deadlock, but not for InnoDB lock wait timeout.

Unless you set innodb_rollback_on_timeout=1 (default is 0 - disabled), automatic rollback is not going to happen for InnoDB lock wait timeout error. This means, by following the default setting, MySQL is not going to fail and rollback the whole transaction, nor retrying again the timed out statement and just process the next statements until it reaches COMMIT or ROLLBACK. This explains why transaction T2 was partially committed!

The InnoDB documentation clearly says "InnoDB rolls back only the last statement on a transaction timeout by default". In this case, we do not get the transaction atomicity offered by InnoDB. The atomicity in ACID compliant is either we get all or nothing of the transaction, which means partial transaction is merely unacceptable.

Dealing With a InnoDB Lock Wait Timeout

So, if you are expecting a transaction to auto-rollback when encounters an InnoDB lock wait error, similarly as what would happen in deadlock, set the following option in MySQL configuration file:

innodb_rollback_on_timeout=1

A MySQL restart is required. When deploying a MySQL-based cluster, ClusterControl will always set innodb_rollback_on_timeout=1 on every node. Without this option, your application has to retry the failed statement, or perform ROLLBACK explicitly to maintain the transaction atomicity.

To verify if the configuration is loaded correctly:

mysql> SHOW GLOBAL VARIABLES LIKE 'innodb_rollback_on_timeout';

+----------------------------+-------+

| Variable_name              | Value |

+----------------------------+-------+

| innodb_rollback_on_timeout | ON    |

+----------------------------+-------+

To check whether the new configuration works, we can track the com_rollback counter when this error happens:

mysql> SHOW GLOBAL STATUS LIKE 'com_rollback';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| Com_rollback  | 1 |

+---------------+-------+

Tracking the Blocking Transaction

There are several places that we can look to track the blocking transaction or statements. Let's start by looking into InnoDB engine status under TRANSACTIONS section:

mysql> SHOW ENGINE INNODB STATUS\G

------------

TRANSACTIONS

------------

...

---TRANSACTION 3100, ACTIVE 2 sec starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)

MySQL thread id 50, OS thread handle 139887555282688, query id 360 localhost ::1 root updating

update table1 set data = 'T2 is updating the row' where id = 1

------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 6 page no 4 n bits 72 index PRIMARY of table `mydb`.`table1` trx id 3100 lock_mode X locks rec but not gap waiting

Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0

 0: len 4; hex 80000001; asc     ;;

 1: len 6; hex 000000000c19; asc       ;;

 2: len 7; hex 020000011b0151; asc       Q;;

 3: len 22; hex 5431206973207570646174696e672074686520726f77; asc T1 is updating the row;;



------------------

---TRANSACTION 3097, ACTIVE 46 sec

2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1

MySQL thread id 48, OS thread handle 139887556167424, query id 358 localhost ::1 root

Trx read view will not see trx with id >= 3097, sees < 3097

From the above information, we can get an overview of the transactions that are currently active in the server. Transaction 3097 is currently locking a row that needs to be accessed by transaction 3100. However, the above output does not tell us the actual query text that could help us figuring out which part of the query/statement/transaction that we need to investigate further. By using the blocker MySQL thread ID 48, let's see what we can gather from MySQL processlist:

mysql> SHOW FULL PROCESSLIST;

+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+

| Id | User            | Host | db                 | Command | Time | State   | Info |

+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+

| 4  | event_scheduler | localhost       | <null> | Daemon | 5146 | Waiting on empty queue | <null>                |

| 10 | root            | localhost:56042 | performance_schema | Query   | 0 | starting | show full processlist |

| 48 | root            | localhost:56118 | mydb               | Sleep | 145 |   | <null> |

| 50 | root            | localhost:56122 | mydb               | Sleep | 113 |   | <null> |

+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+

Thread ID 48 shows the command as 'Sleep'. Still, this does not help us much to know which statements that block the other transaction. This is because the statement in this transaction has been executed and this open transaction is basically doing nothing at the moment. We need to dive further down to see what is going on with this thread.

For MySQL 8.0, the InnoDB lock wait instrumentation is available under data_lock_waits table inside performance_schema database (or innodb_lock_waits table inside sys database). If a lock wait event is happening, we should see something like this:

mysql> SELECT * FROM performance_schema.data_lock_waits\G

***************************[ 1. row ]***************************

ENGINE                           | INNODB

REQUESTING_ENGINE_LOCK_ID        | 139887595270456:6:4:2:139887487554680

REQUESTING_ENGINE_TRANSACTION_ID | 3100

REQUESTING_THREAD_ID             | 89

REQUESTING_EVENT_ID              | 8

REQUESTING_OBJECT_INSTANCE_BEGIN | 139887487554680

BLOCKING_ENGINE_LOCK_ID          | 139887595269584:6:4:2:139887487548648

BLOCKING_ENGINE_TRANSACTION_ID   | 3097

BLOCKING_THREAD_ID               | 87

BLOCKING_EVENT_ID                | 9

BLOCKING_OBJECT_INSTANCE_BEGIN   | 139887487548648

Note that in MySQL 5.6 and 5.7, the similar information is stored inside innodb_lock_waits table under information_schema database. Pay attention to the BLOCKING_THREAD_ID value. We can use the this information to look for all statements being executed by this thread in events_statements_history table:

mysql> SELECT * FROM performance_schema.events_statements_history WHERE `THREAD_ID` = 87;

0 rows in set

It looks like the thread information is no longer there. We can verify by checking the minimum and maximum value of the thread_id column in events_statements_history table with the following query:

mysql> SELECT min(`THREAD_ID`), max(`THREAD_ID`) FROM performance_schema.events_statements_history;

+------------------+------------------+

| min(`THREAD_ID`) | max(`THREAD_ID`) |

+------------------+------------------+

| 98               | 129 |

+------------------+------------------+

The thread that we were looking for (87) has been truncated from the table. We can confirm this by looking at the size of event_statements_history table:

mysql> SELECT @@performance_schema_events_statements_history_size;

+-----------------------------------------------------+

| @@performance_schema_events_statements_history_size |

+-----------------------------------------------------+

| 10                                                  |

+-----------------------------------------------------+

The above means the events_statements_history can only store the last 10 threads. Fortunately, performance_schema has another table to store more rows called events_statements_history_long, which stores similar information but for all threads and it can contain way more rows:

mysql> SELECT @@performance_schema_events_statements_history_long_size;

+----------------------------------------------------------+

| @@performance_schema_events_statements_history_long_size |

+----------------------------------------------------------+

| 10000                                                    |

+----------------------------------------------------------+

However, you will get an empty result if you try to query the events_statements_history_long table for the first time. This is expected because by default, this instrumentation is disabled in MySQL as we can see in the following setup_consumers table:

mysql> SELECT * FROM performance_schema.setup_consumers;

+----------------------------------+---------+

| NAME                             | ENABLED |

+----------------------------------+---------+

| events_stages_current            | NO |

| events_stages_history            | NO |

| events_stages_history_long       | NO |

| events_statements_current        | YES |

| events_statements_history        | YES |

| events_statements_history_long   | NO |

| events_transactions_current      | YES |

| events_transactions_history      | YES |

| events_transactions_history_long | NO      |

| events_waits_current             | NO |

| events_waits_history             | NO |

| events_waits_history_long        | NO |

| global_instrumentation           | YES |

| thread_instrumentation           | YES |

| statements_digest                | YES |

+----------------------------------+---------+

To activate table events_statements_history_long, we need to update the setup_consumers table as below:
mysql> UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name = 'events_statements_history_long';

Verify if there are rows in the events_statements_history_long table now:

mysql> SELECT count(`THREAD_ID`) FROM performance_schema.events_statements_history_long;

+--------------------+

| count(`THREAD_ID`) |

+--------------------+

| 4                  |

+--------------------+

Cool. Now we can wait until the InnoDB lock wait event raises again and when it is happening, you should see the following row in the data_lock_waits table:

mysql> SELECT * FROM performance_schema.data_lock_waits\G

***************************[ 1. row ]***************************

ENGINE                           | INNODB

REQUESTING_ENGINE_LOCK_ID        | 139887595270456:6:4:2:139887487555024

REQUESTING_ENGINE_TRANSACTION_ID | 3083

REQUESTING_THREAD_ID             | 60

REQUESTING_EVENT_ID              | 9

REQUESTING_OBJECT_INSTANCE_BEGIN | 139887487555024

BLOCKING_ENGINE_LOCK_ID          | 139887595269584:6:4:2:139887487548648

BLOCKING_ENGINE_TRANSACTION_ID   | 3081

BLOCKING_THREAD_ID               | 57

BLOCKING_EVENT_ID                | 8

BLOCKING_OBJECT_INSTANCE_BEGIN   | 139887487548648

Again, we use the BLOCKING_THREAD_ID value to filter all statements that have been executed by this thread against events_statements_history_long table:

mysql> SELECT `THREAD_ID`,`EVENT_ID`,`EVENT_NAME`, `CURRENT_SCHEMA`,`SQL_TEXT` FROM events_statements_history_long 

WHERE `THREAD_ID` = 57

ORDER BY `EVENT_ID`;

+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

| THREAD_ID | EVENT_ID | EVENT_NAME            | CURRENT_SCHEMA | SQL_TEXT                                   |

+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

| 57        | 1 | statement/sql/select  | <null> | select connection_id()                                         |

| 57        | 2 | statement/sql/select  | <null> | SELECT @@VERSION                                               |

| 57        | 3 | statement/sql/select  | <null> | SELECT @@VERSION_COMMENT                                       |

| 57        | 4 | statement/com/Init DB | <null>         | <null>             |

| 57        | 5 | statement/sql/begin   | mydb | begin                                               |

| 57        | 7 | statement/sql/select  | mydb | select 'T1 is in the house'                                    |

| 57        | 8 | statement/sql/select  | mydb | select * from table1                                           |

| 57        | 9 | statement/sql/select  | mydb | select 'some more select'                                      |

| 57        | 10 | statement/sql/update  | mydb | update table1 set data = 'T1 is updating the row' where id = 1 |

+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

Finally, we found the culprit. We can tell by looking at the sequence of events of thread 57 where the above transaction (T1) still has not finished yet (no COMMIT or ROLLBACK), and we can see the very last statement has obtained an exclusive lock to the row for update operation which needed by the other transaction (T2) and just hanging there. That explains why we see 'Sleep' in the MySQL processlist output.

As we can see, the above SELECT statement requires you to get the thread_id value beforehand. To simplify this query, we can use IN clause and a subquery to join both tables. The following query produces an identical result like the above:

mysql> SELECT `THREAD_ID`,`EVENT_ID`,`EVENT_NAME`, `CURRENT_SCHEMA`,`SQL_TEXT` from events_statements_history_long WHERE `THREAD_ID` IN (SELECT `BLOCKING_THREAD_ID` FROM data_lock_waits) ORDER BY `EVENT_ID`;

+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

| THREAD_ID | EVENT_ID | EVENT_NAME            | CURRENT_SCHEMA | SQL_TEXT                                   |

+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

| 57        | 1 | statement/sql/select  | <null> | select connection_id()                                         |

| 57        | 2 | statement/sql/select  | <null> | SELECT @@VERSION                                               |

| 57        | 3 | statement/sql/select  | <null> | SELECT @@VERSION_COMMENT                                       |

| 57        | 4 | statement/com/Init DB | <null>         | <null>             |

| 57        | 5 | statement/sql/begin   | mydb | begin                                               |

| 57        | 7 | statement/sql/select  | mydb | select 'T1 is in the house'                                    |

| 57        | 8 | statement/sql/select  | mydb | select * from table1                                           |

| 57        | 9 | statement/sql/select  | mydb | select 'some more select'                                      |

| 57        | 10 | statement/sql/update  | mydb | update table1 set data = 'T1 is updating the row' where id = 1 |

+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

However, it is not practical for us to execute the above query whenever InnoDB lock wait event occurs. Apart from the error from the application, how would you know that the lock wait event is happening? We can automate this query execution with the following simple Bash script, called track_lockwait.sh:

$ cat track_lockwait.sh

#!/bin/bash

## track_lockwait.sh

## Print out the blocking statements that causing InnoDB lock wait



INTERVAL=5

DIR=/root/lockwait/



[ -d $dir ] || mkdir -p $dir



while true; do

check_query=$(mysql -A -Bse 'SELECT THREAD_ID,EVENT_ID,EVENT_NAME,CURRENT_SCHEMA,SQL_TEXT FROM events_statements_history_long WHERE THREAD_ID IN (SELECT BLOCKING_THREAD_ID FROM data_lock_waits) ORDER BY EVENT_ID')



# if $check_query is not empty

if [[ ! -z $check_query ]]; then

timestamp=$(date +%s)

echo $check_query > $DIR/innodb_lockwait_report_${timestamp}

fi



sleep $INTERVAL

done

Apply executable permission and daemonize the script in the background:

$ chmod 755 track_lockwait.sh

$ nohup ./track_lockwait.sh &

Now, we just need to wait for the reports to be generated under the /root/lockwait directory. Depending on the database workload and row access patterns, you might probably see a lot of files under this directory. Monitor the directory closely otherwise it would be flooded with too many report files.

If you are using ClusterControl, you can enable the Transaction Log feature under Performance -> Transaction Log where ClusterControl will provide a report on deadlocks and long-running transactions which will ease up your life in finding the culprit.

Conclusion

It is really important to enable innodb_rollback_on_timeout if your application does not handle the InnoDB lock wait timeout error properly. Otherwise, you might lose the transaction atomicity, and tracking down the culprit is not a straightforward task.

Tags:

error

MySQL

MariaDB

database troubleshooting

troubleshooting

innodb

locking

PostgreSQL 12 comes with a great new feature, Generated Columns. The functionality isn’t exactly anything new, but the standardization, ease of use, accessibility, and performance has been improved in this new version.

A Generated Column is a special column in a table that contains data automatically generated from other data within the row. The content of the generated column is automatically populated and updated whenever the source data, such as any other columns in the row, are changed themselves.

Generated Columns in PostgreSQL 12+

In recent versions of PostgreSQL, generated columns are a built-in feature allowing the CREATE TABLE or ALTER TABLE statements to add a column in which the content is automatically ‘generated’ as a result of an expression. These expressions could be simple mathematical operations from other columns, or a more advanced immutable function.Some benefits of implementing a generated column into a database design include:

The ability to add a column to a table containing computed data without need of updating application code to generate the data to then include it within INSERT and UPDATE operations.
Reducing processing time on extremely frequent SELECT statements that would process the data on the fly. Since the processing of the data is done at the time of INSERT or UPDATE, the data is generated once and the SELECT statements only need to retrieve the data. In heavy read environments, this may be preferable, as long as the extra data storage used is acceptable.
Since generated columns are updated automatically when the source data itself is updated, adding a generated column will add an assumed guarantee that the data in the generated column is always correct.

In PostgreSQL 12, only the ‘STORED’ type of generated column is available. In other database systems, a generated column with a type ‘VIRTUAL’ is available, which acts more like a view where the result is calculated on the fly when the data is retrieved. Since the functionality is so similar to views, and simply writing the operation into a select statement, the functionality isn’t as beneficial as the ‘STORED’ functionality discussed here, but there’s a chance future versions will include the feature.

Creating a Table With a Generated Column is done when defining the column itself. In this example, the generated column is ‘profit’, and is automatically generated by subtracting the purchase_price from the sale_price columns, then multiplied by the quantity_sold column.

CREATE TABLE public.transactions (

    transactions_sid serial primary key,

    transaction_date timestamp with time zone DEFAULT now() NOT NULL,

    product_name character varying NOT NULL,

    purchase_price double precision NOT NULL,

    sale_price double precision NOT NULL,

    quantity_sold integer NOT NULL,

    profit double precision NOT NULL GENERATED ALWAYS AS  ((sale_price - purchase_price) * quantity_sold) STORED

);

In this example, a ‘transactions’ table is created to track some basic transactions and profits of an imaginary coffee shop. Inserting data into this table will show some immediate results.

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);



severalnines=# SELECT * FROM public.transactions;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                1 | 2020-02-28 04:50:06.626371+00 | House Blend Coffee             | 5 | 11.99 | 1 | 6.99

                2 | 2020-02-28 04:50:53.313572+00 | French Roast Coffee            | 6 | 12.99 | 4 | 27.96

                3 | 2020-02-28 04:51:08.531875+00 | BULK: House Blend Coffee, 10LB |             40 | 100 | 6 | 360

When updating the row, the generated column will automatically update:

severalnines=# UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;

UPDATE 1



severalnines=# SELECT * FROM public.transactions WHERE transactions_sid = 3;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                3 | 2020-02-28 05:55:11.233077+00 | BULK: House Blend Coffee, 10LB |             40 | 95 | 6 | 330

This will ensure that the generated column is always correct, with no additional logic needed on the application side.

NOTE: Generated columns cannot be INSERTED into or UPDATED directly, and any attempt to do so will return in an ERROR:

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold, profit) VALUES ('BULK: House Blend Coffee, 10LB', 40, 95, 1, 95);

ERROR:  cannot insert into column "profit"

DETAIL:  Column "profit" is a generated column.



severalnines=# UPDATE public.transactions SET profit = 330 WHERE transactions_sid = 3;

ERROR:  column "profit" can only be updated to DEFAULT

DETAIL:  Column "profit" is a generated column.

Generated Columns on PostgreSQL 11 and Before

Even though built-in generated columns are new to version 12 of PostgreSQL, the functionally can still be achieved in earlier versions, it just needs a bit more setup with stored procedures and triggers. However, even with the ability to implement it on older versions, in addition to the added functionality that can be beneficial, strict data input compliance is harder to achieve, and depends on PL/pgSQL features and programming ingenuity.

BONUS: The below example will also work on PostgreSQL 12+, so if the added functionality with a function / trigger combo is needed or desired in newer versions, this option is a valid fallback and not restricted to just versions older than 12.

While this is a way to do it on previous versions of PostgreSQL, there are a couple of additional benefits of this method:

Since mimicking the generated column uses a function, more complex calculations are able to be used. Generated Columns in version 12 require IMMUTABLE operations, but a trigger / function option could use a STABLE or VOLATILE type of function with greater possibilities and likely lesser performance accordingly.
Using a function that has the option to be STABLE or VOLATILE also opens up the possibility to UPDATE additional columns, UPDATE other tables, or even create new data via INSERTS into other tables. (However, while these trigger / function options are much more flexible, that’s not to say an actual “Generated Column” is lacking, as it does what’s advertised with greater performance and efficiency.)

In this example, a trigger / function is set up to mimic the functionality of a PostgreSQL 12+ generated column, along with two pieces that raise an exception if an INSERT or UPDATE attempt to change the generated column. These can be omitted, but if they are omitted, exceptions will not be raised, and the actual data INSERTed or UPDATEd will be quietly discarded, which generally wouldn’t be recommended.

The trigger itself is set to run BEFORE, which means the processing happens before the actual insert happens, and requires the RETURN of NEW, which is the RECORD that is modified to contain the new generated column value. This specific example was written to run on PostgreSQL version 11.

CREATE TABLE public.transactions (

    transactions_sid serial primary key,

    transaction_date timestamp with time zone DEFAULT now() NOT NULL,

    product_name character varying NOT NULL,

    purchase_price double precision NOT NULL,

    sale_price double precision NOT NULL,

    quantity_sold integer NOT NULL,

    profit double precision NOT NULL

);



CREATE OR REPLACE FUNCTION public.generated_column_function()

 RETURNS trigger

 LANGUAGE plpgsql

 IMMUTABLE

AS $function$

BEGIN



    -- This statement mimics the ERROR on built in generated columns to refuse INSERTS on the column and return an ERROR.

    IF (TG_OP = 'INSERT') THEN

        IF (NEW.profit IS NOT NULL) THEN

            RAISE EXCEPTION 'ERROR:  cannot insert into column "profit"' USING DETAIL = 'Column "profit" is a generated column.';

        END IF;

    END IF;



    -- This statement mimics the ERROR on built in generated columns to refuse UPDATES on the column and return an ERROR.

    IF (TG_OP = 'UPDATE') THEN

        -- Below, IS DISTINCT FROM is used because it treats nulls like an ordinary value. 

        IF (NEW.profit::VARCHAR IS DISTINCT FROM OLD.profit::VARCHAR) THEN

            RAISE EXCEPTION 'ERROR:  cannot update column "profit"' USING DETAIL = 'Column "profit" is a generated column.';

        END IF;

    END IF;



    NEW.profit := ((NEW.sale_price - NEW.purchase_price) * NEW.quantity_sold);

    RETURN NEW;



END;

$function$;




CREATE TRIGGER generated_column_trigger BEFORE INSERT OR UPDATE ON public.transactions FOR EACH ROW EXECUTE PROCEDURE public.generated_column_function();

NOTE: Make sure the function has the correct permissions / ownership to be executed by the desired application user(s).

As seen in the previous example, the results are the same in previous versions with a function / trigger solution:

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);



severalnines=# SELECT * FROM public.transactions;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                1 | 2020-02-28 00:35:14.855511-07 | House Blend Coffee             | 5 | 11.99 | 1 | 6.99

                2 | 2020-02-28 00:35:21.764449-07 | French Roast Coffee            | 6 | 12.99 | 4 | 27.96

                3 | 2020-02-28 00:35:27.708761-07 | BULK: House Blend Coffee, 10LB |             40 | 100 | 6 | 360

Updating the data will be similar.

severalnines=# UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;

UPDATE 1



severalnines=# SELECT * FROM public.transactions WHERE transactions_sid = 3;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                3 | 2020-02-28 00:48:52.464344-07 | BULK: House Blend Coffee, 10LB |             40 | 95 | 6 | 330

Lastly, attempting to INSERT into, or UPDATE the special column itself will result in an ERROR:

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold, profit) VALUES ('BULK: House Blend Coffee, 10LB', 40, 95, 1, 95);

ERROR:  ERROR: cannot insert into column "profit"

DETAIL:  Column "profit" is a generated column.

CONTEXT:  PL/pgSQL function generated_column_function() line 7 at RAISE



severalnines=# UPDATE public.transactions SET profit = 3030 WHERE transactions_sid = 3;

ERROR:  ERROR: cannot update column "profit"

DETAIL:  Column "profit" is a generated column.

CONTEXT:  PL/pgSQL function generated_column_function() line 15 at RAISE

In this example, it does act differently than the first generated column setup in a couple of ways that should be noted:

If the ‘generated column’ is attempted to be updated but no row is found to be updated, it will return success with an “UPDATE 0” result, while an actual Generated Column in version 12 will still return an ERROR, even if no row is found to UPDATE.
When attempting to update the profit column, which ‘should’ always return an ERROR, if the specified value is the same as the correctly ‘generated’ value, it will succeed. Ultimately the data is correct, however, if the desire is to return an ERROR if the column is specified.

Documentation and PostgreSQL Community

The official documentation for the PostgreSQL Generated Columns is located at the official PostgreSQL Website. Check back when new major versions of PostgreSQL are released to discover new features when they appear.

While generated columns in PostgreSQL 12 are fairly straight forward, implementing similar functionality in previous versions has the potential to get much more complicated. The PostgreSQL community is a very active, massive, worldwide, and multilingual community dedicated to helping people of any level of PostgreSQL experience solve problems and create new solutions such as this.

IRC: Freenode has a very active channel called #postgres, where users help each other understand concepts, fix errors, or find other resources. A full list of available freenode channels for all things PostgreSQL can be found on the PostgreSQL.org website.
Mailing Lists: PostgreSQL has a handful of mailing lists that can be joined. Longer form questions / issues can be sent here, and can reach many more people than IRC at any given time. The lists can be found on the PostgreSQL Website, and the lists pgsql-general or pgsql-admin are good resources.
Slack: The PostgreSQL community has also been thriving on Slack, and can be joined at postgresteam.slack.com. Much like IRC, an active community is available to answer questions and engage in all things PostgreSQL.

Tags:

postgres

PostgreSQL

release

Data often requires high end security on nearly every level of the data transaction so as to meet security policies, compliance, and government regulations. Organization reputation may be wrecked if there is unauthorized access to sensitive data, hence failure to comply with the outlined mandate.

In this blog we will be discussing some of the security measures you can employ in regards to MongoDB, especially focusing on the client side of things.

Scenarios Where Data May Be Accessed

There are several ways someone can access your MongoDB data, here are some of them...

Capture of data over an insecure network. Someone can access your data through an API with a VPN network and it will be difficult to track them down. Data at rest is often the culprit in this case.
A super user such as an administrator having direct access. This happens when you fail to define user roles and restrictions.
Having access to on-disk data while reading databases of backup files.
Reading the server memory and logged data.
Accidental disclosure of data by staff member.

MongoDB Data Categories and How They are Secured

In general, any database system involves two type of data:

Data-at-rest : One that is stored in the database files
Data-in-transit: One that is transacted between a client, server and the database.

MongoDB has an Encryption at Rest feature that encrypts database files on disk hence preventing access to on-disk database files.

Data-in-transit over a network can be secured in MongoDB through Transport Encryption using TLS/SSL by encrypting the data.

In the case of data being accidentally disclosed by a staff member for instance a receptionist on desktop screen, MongoDB integrates the Role-Based Access Control that allows administrators to grant and restrict collection-level permission for users.

Data transacted over the server may remain in memory and these approaches do not at any point address the security concern against data access in server memory. MongoDB therefore introduced Client-Side Field Level Encryption for encrypting specific fields of a document that involve confidential data.

Field Level Encryption

MongoDB works with documents that have defined fields. Some fields may be required to hold confidential information such as credit card number, social security number, patience diagnosis data and so much more.

Field Level Encryption will enable us to secure the fields and they can only be accessed by an authorized personnel with the decryption keys.

Encryption can be done in two ways

Using a secret key. A single key is used for both encrypting and decrypting hence it has to be presented at source and destination transmission but kept secret by all parties.
Using a public key. Uses a pair of keys whereby one is used to encrypt and the other used to decrypt

When applying Field Level Encryption consider using a new database setup rather than an existing one.

Client-Side Field Level Encryption (CSFLE)

Introduced in MongoDB version 4.2 Enterprise to offer database administrators with an adjustment to encrypt fields involving values that need to be secured. This is to say, the sensitive data is encrypted or decrypted by the client and only communicated to and from the server in an encrypted form. Besides, even super users who don’t have the encryption keys, will not have control over these encrypted data fields.

How to Implement CSFLE

In order for you to implement the Client-Side Field Level Encryption, you require the following:

MongoDB Server 4.2 Enterprise
MongoDB Compatible with CSFLE
File System Permissions
Specific language drivers. (In our blog we are going to use Node.js)

The implementation procedure involves:

A local development environment with a software for running client and server
Generating and validating the encryption keys.
Configuring the client for automatic field-level encryption
Throughout operations in terms of queries of the encrypted fields.

CSFLE Implementation

CSFLE uses the envelope encryption strategy whereby data encryption keys are encrypted with another key known as the master key. The Client application creates a master key that is stored in the Local Key Provider essentially the local file system.However, this storage approach is insecure hence in production, one is advised to configure the key in a Key Management System (KMS) that stores and decrypts data encryption keys remotely.

After the data encryption keys are generated, they are stored in the vault collection in the same MongoDB replica set as the encrypted data.

Create Master Key

In node js, we need to generate a 96-byte locally managed master key and write it to a file in the directory where the main script is executed from:

$npm install fs && npm install crypto

Then in the script:

const crypto = require(“crypto”)

const fs = require(“fs”)



try{

fs.writeFileSync(‘masterKey.txt’, crypto.randomBytes(96))

}catch(err){

throw err;

}

Create Data Encryption Key

This key is stored in a key vault collection where CSFLE enabled clients can access the key for encryption/decryption. To generate one, you need the following:

Locally-managed master key
Connection to your database that is, the MongoDB connection string
Key vault namespace (database and collection)

Steps to Generate the Data Encryption Key

Read the local master key generate before

const localMasterKey = fs.readFileSync(‘./masterKey.txt’);

Specify the KMS provider settings that will be used by the client to discover the master key.

const kmsProvider = {

local: {

key: localMasterKey

}

}

Creating the Data Encryption Key. We need to create a client with the MongoDB connection string and key vault namespace configuration. Let’s say we will have a database called users and inside it a keyVault collection. You need to install uuid-base64 first by running the command

$ npm install uuid-base64

Then in your script

const base64 = require('uuid-base64');

const keyVaultNamespace = 'users.keyVaul';

const client = new MongoClient('mongodb://localhost:27017', {

  useNewUrlParser: true,

  useUnifiedTopology: true,

});

async function createKey() {

  try {

    await client.connect();

    const encryption = new ClientEncryption(client, {

      keyVaultNamespace,

      kmsProvider,

    });

    const key = await encryption.createDataKey('local');

    const base64DataKeyId = key.toString('base64');

    const uuidDataKeyId = base64.decode(base64DataKeyId);

    console.log('DataKeyId [UUID]: ', uuidDataKeyId);

    console.log('DataKeyId [base64]: ', base64DataKeyId);

  } finally {

    await client.close();

  }

}

createKey();

You will then be presented with some result that resemble

DataKeyId [UUID]: ad4d735a-44789-48bc-bb93-3c81c3c90824

DataKeyId [base64]: 4K13FkSZSLy7kwABP4HQyD==

The client must have ReadWrite permissions on the specified key vault namespace

To verify that the Data Encryption Key was created

const client = new MongoClient('mongodb://localhost:27017', {

  useNewUrlParser: true,

  useUnifiedTopology: true,

});



async function checkClient() {

  try {

    await client.connect();

    const keyDB = client.db(users);

    const keyColl = keyDB.collection(keyVault);

    const query = {

      _id: ‘4K13FkSZSLy7kwABP4HQyD==’,

    };

    const dataKey = await keyColl.findOne(query);

    console.log(dataKey);

  } finally {

    await client.close();

  }

}

checkClient();

You should receive some result of the sort

{

  _id: Binary {

    _bsontype: 'Binary',

    sub_type: 4,

    position: 2,

    buffer: <Buffer 68 ca d2 10 16 5d 45 bf 9d 1d 44 d4 91 a6 92 44>

  },

  keyMaterial: Binary {

    _bsontype: 'Binary',

    sub_type: 0,

    position: 20,

    buffer: <Buffer f1 4a 9f bd aa ac c9 89 e9 b3 da 48 72 8e a8 62 97 2a 4a a0 d2 d4 2d a8 f0 74 9c 16 4d 2c 95 34 19 22 05 05 84 0e 41 42 12 1e e3 b5 f0 b1 c5 a8 37 b8 ... 110 more bytes>

  },

  creationDate: 2020-02-08T11:10:20.021Z,

  updateDate: 2020-02-08T11:10:25.021Z,

  status: 0,

  masterKey: { provider: 'local' }

}

The returned document data incorporates: data encryption key id (UUID), data encryption key in encrypted form, KMS provider information of master key and metadata like day of creation.

Specifying Fields to be Encrypted Using the JSON Schema

A JSON Schema extension is used by the MongoDB drivers to configure automatic client-side encryption and decryption of the specified fields of documents in a collection. The CSFLE configuration for this schema will require: the encryption algorithm to use when encrypting each field, one or all the encryption keys encrypted with the CSFLE master key and the BSON Type of each field.

However, this CSFLE JSON schema does not support document validation otherwise any validation instances will cause the client to throw an error.

Clients who are not configured with the appropriate client-side JSON Schema can be restricted from writing unencrypted data to a field by using the server-side JSON Schema.

There are mainly two encryption algorithms: Random and deterministic.

We will define some encryptMetadata key at root level of the JSON Schema and configure it with the fields to be encrypted by defining them in the properties field of the schema hence they will be able to inherit this encryption key.

{

    "bsonType" : "object",

    "encryptMetadata" : {

        "keyId" : // keyId generated here

    },

    "properties": {

        // field schemas here

    }

}

Let’s say you want to encrypt a bank account number field, you would do something like:

"bankAccountNumber": {

    "encrypt": {

        "bsonType": "int",

        "algorithm": "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"

    }

}

Because of high cardinality and the field being queryable, we use the deterministic approach. Sensitive fields such as blood type which have low query plan and low cardinality may be encrypted using the random approach.

Array fields should use random encryption with CSFLE to enhance auto-encryption for all the elements.

Mongocryptd Application

Installed in the MongoDB Enterprise Service 4.2 and later, this is a separate encryption application that automates the Client-side Field Level Encryption. Whenever a CSFLE enabled client is created, this service is automatically started by default to:

Validate encryption instructions outlined in the JSON Schema, detect which fields are to be encrypted in the throughput operations.
Prevent unsupported operations from being executed on the encrypted fields.

To insert the data we will do the normal insert query and the resulting document will have sample data below in regard with the bank account field.

{

…

"bankAccountNumber":"Ac+ZbPM+sk7gl7CJCcIzlRAQUJ+uo/0WhqX+KbTNdhqCszHucqXNiwqEUjkGlh7gK8pm2JhIs/P3//nkVP0dWu8pSs6TJnpfUwRjPfnI0TURzQ==",

…

}

When an authorized personnel performs a query, the driver will decrypt this data and return in in a readable format i.e

{

…

"bankAccountNumber":43265436456456456756,

…

}

Note: It is not possible to query for documents on a randomly encrypted field unless you use another field to find the document that contains an approximation of the randomly encrypted field data.

Conclusion

Data security should be considered at all levels in regard to one at rest and transit. MongoDB Enterprise 4.2 Server offers developers with a window to encrypt data from the client side using the Client-Side Field Level Encryption hence securing the data from the database host providers and insecure network access. CSFLE uses envelope encryption where a master key is used to encrypt data encryption keys. The master key should therefore be kept safe using key management tools such as Key Management System.

Tags:

There are different reasons for adding a load balancer between your application and your database. If you have high traffic (and you want to balance the traffic between different database nodes) or you want to use the load balancer as a single endpoint (so in case of failover, this load balancer will cope with this issue sending the traffic to the available/healthy node.) It could also be that you want to use different ports to write and read data from your database.

In all these cases, a load balancer will be useful for you, and if you have a MariaDB cluster, one option for this is using MaxScale which is a database proxy for MariaDB databases.

In this blog, we will show you how to install and configure it manually, and how ClusterControl can help you in this task. For this example, we will use a MariaDB replication cluster with 1 master and 1 slave node, and CentOS8 as the operating system.

How to Install MaxScale

We will assume you have your MariaDB database up and running, and also a machine (virtual or physical) to install MaxScale. We recommend you use a different host, so in case of master failure, MaxScale can failover to the slave node, otherwise, MaxScale can’t take any action if the server where it is running goes down.

There are different ways to install MaxScale, in this case, we will use the MariaDB repositories. To add it into the MaxScale server, you have to run:

$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash

[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo

[info] Adding trusted package signing keys...

[info] Successfully added trusted package signing keys

Now, install the MaxScale package:

$ yum install maxscale

Now you have your MaxScale node installed, before starting, you need to configure it.

How to Configure MaxScale

As MaxScale perform tasks like authentication, monitoring, and more, you need to create a database user with some specific privileges:

MariaDB [(none)]> CREATE USER 'maxscaleuser'@'%' IDENTIFIED BY 'maxscalepassword';

MariaDB [(none)]> GRANT SELECT ON mysql.user TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SELECT ON mysql.db TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SELECT ON mysql.tables_priv TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SELECT ON mysql.roles_mapping TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SHOW DATABASES ON *.* TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT REPLICATION CLIENT on *.* to 'maxscaleuser'@'%';

Keep in mind that MariaDB versions 10.2.2 to 10.2.10 also require:

MariaDB [(none)]> GRANT SELECT ON mysql.* TO 'maxscaleuser'@'%';

Now you have the database user ready, let’s see the configuration files. When you install MaxScale, the file maxscale.cnf will be created under /etc/. There are several variables and different ways to configure it, so let’s see an example:

$ cat  /etc/maxscale.cnf 

# Global parameters

[maxscale]

threads = auto

log_augmentation = 1

ms_timestamp = 1

syslog = 1



# Server definitions

[server1]

type=server

address=192.168.100.126

port=3306

protocol=MariaDBBackend

[server2]

type=server

address=192.168.100.127

port=3306

protocol=MariaDBBackend



# Monitor for the servers

[MariaDB-Monitor]

type=monitor

module=mariadbmon

servers=server1,server2

user=maxscaleuser

password=maxscalepassword

monitor_interval=2000



# Service definitions

[Read-Only-Service]

type=service

router=readconnroute

servers=server2

user=maxscaleuser

password=maxscalepassword

router_options=slave

[Read-Write-Service]

type=service

router=readwritesplit

servers=server1

user=maxscaleuser

password=maxscalepassword



# Listener definitions for the services

[Read-Only-Listener]

type=listener

service=Read-Only-Service

protocol=MariaDBClient

port=4008

[Read-Write-Listener]

type=listener

service=Read-Write-Service

protocol=MariaDBClient

port=4006

In this configuration, we have 2 database nodes, 192.168.100.126 (Master) and 192.168.100.127 (Slave), as you can see in the Servers Definition section.

We have also 2 different services, one for read-only, where there is the slave node, and another one for read-write where there is the master node.

Finally, we have 2 listeners, one for each service. The read-only listener, listening in the port 4008, and the read-write one, listening in the port 4006.

This is a basic configuration file. If you need something more specific you can follow the official MariaDB documentation.

Now you are ready to start it, so just run:

$ systemctl start maxscale.service

And check it:

$ maxctrl list services

$ maxctrl list servers

You can find a maxctrl commands list here, or you can even use maxadmin to manage it.

Now let’s test the connection. For this, you can try to access your database using the MaxScale IP address and the port that you want to test. In our case, the traffic on the port 4006 should be sent to server1, and the traffic on the port 4008 to server2.

$ mysql -h 192.168.100.128 -umaxscaleuser -pmaxscalepassword -P4006 -e 'SELECT @@hostname;'

+------------+

| @@hostname |

+------------+

| server1   |

+------------+

$ mysql -h 192.168.100.128 -umaxscaleuser -pmaxscalepassword -P4008 -e 'SELECT @@hostname;'

+------------+

| @@hostname |

+------------+

| server2   |

+------------+

It works!

How to Deploy MaxScale with ClusterControl

Let’s see now, how you can use ClusterControl to simplify this task. For this, we will assume you have your MariaDB cluster added to ClusterControl.

Go to ClusterControl -> Select the MariaDB cluster -> Cluster Actions -> Add Load Balancer -> MaxScale.

Here you can deploy a new MaxScale node or you can also import an existing one. If you are deploying it, you need to add the IP Address or Hostname, the admin and user MaxScale credentials, amount of threads, and ports (write and read-only). You can also specify which database node you want to add to the MaxScale configuration.

You can monitor the task in the ClusterControl Activity section. When it finishes, you will have a new MaxScale node in your MariaDB cluster.

And running the MaxScale commands from the ClusterControl UI without the need of accessing the server via SSH.

It looks easier than deploying it manually, right?

Conclusion

Having a Load Balancer is a good solution if you want to balance or split your traffic, or even for failover actions, and MaxScale, as a MariaDB product, is a good option for MariaDB databases.

The installation is easy, but the configuration and usage could be difficult if it is something new for you. In that case, you can use ClusterControl to deploy, configure, and manage it in an easier way.

Tags:

We wanted to take a moment to address how Severalnines is handling the Covid-19 pandemic. We have employees scattered across 16 countries over 5 continents, this means that some of our employees are at higher risk than others.

The good news is that Severalnines is a fully remote organization, so for the most part it’s business as usual for us here. Our sales and support teams are ready to provide support to our prospects and customers as needed and we expect to maintain our average response times and service level agreements. Our product team is actively working on the next release of ClusterControl which comes with new features and several bug fixes.

We have encouraged our employees to follow the recommendations made by their local governments and the World Health Organization (WHO). Several of our employees have been impacted by the closure of schools around the world. To Severalnines, families come first, so we encourage our teams to work when their schedules allow and to focus on the health and security of their family as the priority. We’re postponing our in-person events as safety comes top of mind.

To our customers & potential customers, if your database operations have been impacted by illness or freezes in hiring, please contact your account manager to see how we can help. The database automation features of ClusterControl can reduce maintenance time, improve availability, and keep you up-and-running.

Severalnines is full of amazing people and we are here to help you in any way we can.

Stay Safe, follow your regions recommendations for staying healthy, and we’ll see you on the other side.

Sincerely,

Vinay Joosery, CEO Severalnines

Binary logs (binlogs) contain records of all changes to the databases. They are necessary for replication and can also be used to restore data after a backup. A binlog server is basically a binary log repository. You can think of it like a server with a dedicated purpose to retrieve binary logs from a master, while slave servers can connect to it like they would connect to a master server.

Some advantages of having a binlog server over intermediate master to distribute replication workload are:

You can switch to a new master server without the slaves noticing that the actual master server has changed. This allows for a more highly available replication setup where replication is high-priority.
Reduce the load on the master by only serving Maxscale’s binlog server instead of all the slaves.
The data in the binary log of the intermediate master is not a direct copy of the data that was received from the binary log of the real master. As such, if group commit is used, this can cause a reduction in the parallelism of the commits and a subsequent reduction in the performance of the slave servers.
Intermediate slave has to re-execute every SQL statement which potentially adds latency and lags into the replication chain.

In this blog post, we are going to look into how to replace an intermediate master (a slave host relay to other slaves in a replication chain) with a binlog server running on MaxScale for better scalability and performance.

Architecture

We basically have a 4-node MariaDB v10.4 replication setup with one MaxScale v2.3 sitting on top of the replication to distribute incoming queries. Only one slave is connected to a master (intermediate master) and the other slaves replicate from the intermediate master to serve read workloads, as illustrated in the following diagram.

We are going to turn the above topology into this:

Basically, we are going to remove the intermediate master role and replace it with a binlog server running on MaxScale. The intermediate master will be converted to a standard slave, just like other slave hosts. The binlog service will be listening on port 5306 on the MaxScale host. This is the port that all slaves will be connecting to for replication later on.

Configuring MaxScale as a Binlog Server

In this example, we already have a MaxScale sitting on top of our replication cluster acting as a load balancer for our applications. If you don't have a MaxScale, you can use ClusterControl to deploy simply go to Cluster Actions -> Add Load Balancer -> MaxScale and fill up the necessary information as the following:

Before we get started, let's export the current MaxScale configuration into a text file for backup. MaxScale has a flag called --export-config for this purpose but it must be executed as maxscale user. Thus, the command to export is:

$ su -s /bin/bash -c '/bin/maxscale --export-config=/tmp/maxscale.cnf' maxscale

On the MariaDB master, create a replication slave user called 'maxscale_slave' to be used by the MaxScale and assign it with the following privileges:

$ mysql -uroot -p -h192.168.0.91 -P3306

CREATE USER 'maxscale_slave'@'%' IDENTIFIED BY 'BtF2d2Kc8H';

GRANT SELECT ON mysql.user TO 'maxscale_slave'@'%';

GRANT SELECT ON mysql.db TO 'maxscale_slave'@'%';

GRANT SELECT ON mysql.tables_priv TO 'maxscale_slave'@'%';

GRANT SELECT ON mysql.roles_mapping TO 'maxscale_slave'@'%';

GRANT SHOW DATABASES ON *.* TO 'maxscale_slave'@'%';

GRANT REPLICATION SLAVE ON *.* TO 'maxscale_slave'@'%';

For ClusterControl users, go to Manage -> Schemas and Users to create the necessary privileges.

Before we move further with the configuration, it's important to review the current state and topology of our backend servers:

$ maxctrl list servers

┌────────┬──────────────┬──────┬─────────────┬──────────────────────────────┬───────────┐

│ Server │ Address      │ Port │ Connections │ State                        │ GTID │

├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤

│ DB_757 │ 192.168.0.90 │ 3306 │ 0           │ Master, Running │ 0-38001-8 │

├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤

│ DB_758 │ 192.168.0.91 │ 3306 │ 0           │ Relay Master, Slave, Running │ 0-38001-8 │

├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤

│ DB_759 │ 192.168.0.92 │ 3306 │ 0           │ Slave, Running │ 0-38001-8 │

├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤

│ DB_760 │ 192.168.0.93 │ 3306 │ 0           │ Slave, Running │ 0-38001-8 │

└────────┴──────────────┴──────┴─────────────┴──────────────────────────────┴───────────┘

As we can see, the current master is DB_757 (192.168.0.90). Take note of this information as we are going to setup the binlog server to replicate from this master.

Open the MaxScale configuration file at /etc/maxscale.cnf and add the following lines:

[replication-service]

type=service

router=binlogrouter

user=maxscale_slave

password=BtF2d2Kc8H

version_string=10.4.12-MariaDB-log

server_id=9999

master_id=9999

mariadb10_master_gtid=true

filestem=binlog

binlogdir=/var/lib/maxscale/binlogs

semisync=true # if semisync is enabled on the master



[binlog-server-listener]

type=listener

service=replication-service

protocol=MariaDBClient

port=5306

address=0.0.0.0

A bit of explanation - We are creating two components - service and listener. Service is where we define the binlog server characteristic and how it should run. Details on every option can be found here. In this example, our replication servers are running with semi-sync replication, thus we have to use semisync=true so it will connect to the master via semi-sync replication method. The listener is where we map the listening port with the binlogrouter service inside MaxScale.

Restart MaxScale to load the changes:

$ systemctl restart maxscale

Verify the binlog service is started via maxctrl (look at the State column):

$ maxctrl show service replication-service

Verify that MaxScale is now listening to a new port for the binlog service:

$ netstat -tulpn | grep maxscale

tcp        0 0 0.0.0.0:3306            0.0.0.0:* LISTEN   4850/maxscale

tcp        0 0 0.0.0.0:3307            0.0.0.0:* LISTEN   4850/maxscale

tcp        0 0 0.0.0.0:5306            0.0.0.0:* LISTEN   4850/maxscale

tcp        0 0 127.0.0.1:8989          0.0.0.0:* LISTEN   4850/maxscale

We are now ready to establish a replication link between MaxScale and the master.

Activating the Binlog Server

Log into the MariaDB master server and retrieve the current binlog file and position:

MariaDB> SHOW MASTER STATUS;

+---------------+----------+--------------+------------------+

| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB |

+---------------+----------+--------------+------------------+

| binlog.000005 |     4204 | |                 |

+---------------+----------+--------------+------------------+

Use BINLOG_GTID_POS function to get the GTID value:

MariaDB> SELECT BINLOG_GTID_POS("binlog.000005", 4204);

+----------------------------------------+

| BINLOG_GTID_POS("binlog.000005", 4204) |

+----------------------------------------+

| 0-38001-31                             |

+----------------------------------------+

Back to the MaxScale server, install MariaDB client package:

$ yum install -y mysql-client

Connect to the binlog server listener on port 5306 as maxscale_slave user and establish a replication link to the designated master. Use the GTID value retrieved from the master:

(maxscale)$ mysql -u maxscale_slave -p'BtF2d2Kc8H' -h127.0.0.1 -P5306

MariaDB> SET @@global.gtid_slave_pos = '0-38001-31';

MariaDB> CHANGE MASTER TO MASTER_HOST = '192.168.0.90', MASTER_USER = 'maxscale_slave', MASTER_PASSWORD = 'BtF2d2Kc8H', MASTER_PORT=3306, MASTER_USE_GTID = slave_pos;

MariaDB> START SLAVE;

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

                 Slave_IO_State: Binlog Dump

                  Master_Host: 192.168.0.90

                  Master_User: maxscale_slave

                  Master_Port: 3306

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

             Master_Server_Id: 38001

             Master_Info_File: /var/lib/maxscale/binlogs/master.ini

      Slave_SQL_Running_State: Slave running

                  Gtid_IO_Pos: 0-38001-31

Note: The above output has been truncated to show only important lines.

Pointing Slaves to the Binlog Server

Now on mariadb2 and mariadb3 (the end slaves), change the master pointing to the MaxScale binlog server. Since we are running with semi-sync replication enabled, we have to turn them off first:

(mariadb2 & mariadb3)$ mysql -uroot -p

MariaDB> STOP SLAVE;

MariaDB> SET global rpl_semi_sync_master_enabled = 0; -- if semisync is enabled

MariaDB> SET global rpl_semi_sync_slave_enabled = 0; -- if semisync is enabled

MariaDB> CHANGE MASTER TO MASTER_HOST = '192.168.0.95', MASTER_USER = 'maxscale_slave', MASTER_PASSWORD = 'BtF2d2Kc8H', MASTER_PORT=5306, MASTER_USE_GTID = slave_pos;

MariaDB> START SLAVE;

MariaDB> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event

                   Master_Host: 192.168.0.95

                   Master_User: maxscale_slave

                   Master_Port: 5306

              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes

              Master_Server_Id: 9999

                    Using_Gtid: Slave_Pos

                   Gtid_IO_Pos: 0-38001-32

       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Note: The above output has been truncated to show only important lines.

Inside my.cnf, we have to comment the following lines to disable semi-sync in the future:

#loose_rpl_semi_sync_slave_enabled=ON

#loose_rpl_semi_sync_master_enabled=ON

At this point, the intermediate master (mariadb1) is still replicating from the master (mariadb0) while other slaves have been replicating from the binlog server. Our current topology can be illustrated like the diagram below:

The final part is to change the master pointing of the intermediate master (mariadb1) after all slaves that used to attach to it are no longer there. The steps are basically the same with the other slaves:

(mariadb1)$ mysql -uroot -p

MariaDB> STOP SLAVE;

MariaDB> SET global rpl_semi_sync_master_enabled = 0; -- if semisync is enabled

MariaDB> SET global rpl_semi_sync_slave_enabled = 0; -- if semisync is enabled

MariaDB> CHANGE MASTER TO MASTER_HOST = '192.168.0.95', MASTER_USER = 'maxscale_slave', MASTER_PASSWORD = 'BtF2d2Kc8H', MASTER_PORT=5306, MASTER_USE_GTID = slave_pos;

MariaDB> START SLAVE;

MariaDB> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event

                   Master_Host: 192.168.0.95

                   Master_User: maxscale_slave

                   Master_Port: 5306

              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes

              Master_Server_Id: 9999

                    Using_Gtid: Slave_Pos

                   Gtid_IO_Pos: 0-38001-32

Note: The above output has been truncated to show only important lines.

Don't forget to disable semi-sync replication in my.cnf as well:

#loose_rpl_semi_sync_slave_enabled=ON

#loose_rpl_semi_sync_master_enabled=ON

We can the verify the binlog router service has more connections now via maxctrl CLI:

$ maxctrl list services

┌─────────────────────┬────────────────┬─────────────┬───────────────────┬───────────────────────────────────┐

│ Service             │ Router │ Connections │ Total Connections │ Servers                           │

├─────────────────────┼────────────────┼─────────────┼───────────────────┼───────────────────────────────────┤

│ rw-service          │ readwritesplit │ 1         │ 1 │ DB_757, DB_758, DB_759, DB_760    │

├─────────────────────┼────────────────┼─────────────┼───────────────────┼───────────────────────────────────┤

│ rr-service          │ readconnroute │ 1         │ 1 │ DB_757, DB_758, DB_759, DB_760    │

├─────────────────────┼────────────────┼─────────────┼───────────────────┼───────────────────────────────────┤

│ replication-service │ binlogrouter   │ 4 │ 51 │ binlog_router_master_host, DB_757 │

└─────────────────────┴────────────────┴─────────────┴───────────────────┴───────────────────────────────────┘

Also, common replication administration commands can be used inside the MaxScale binlog server, for example, we can verify the connected slave hosts by using this command:

(maxscale)$ mysql -u maxscale_slave -p'BtF2d2Kc8H' -h127.0.0.1 -P5306

MariaDB> SHOW SLAVE HOSTS;

+-----------+--------------+------+-----------+------------+

| Server_id | Host         | Port | Master_id | Slave_UUID |

+-----------+--------------+------+-----------+------------+

| 38003     | 192.168.0.92 | 3306 | 9999      | |

| 38002     | 192.168.0.91 | 3306 | 9999      | |

| 38004     | 192.168.0.93 | 3306 | 9999      | |

+-----------+--------------+------+-----------+------------+

At this point, our topology is looking as what we anticipated:

Our migration from intermediate master setup to binlog server setup is now complete.

Tags:

Keeping backups of your database is one of the most important tasks in any production environment. It is the process of copying your data to some other place to keep it safe. This can be useful in recovery from emergency situations like database corruption or a database crashing beyond repair.

Apart from recovery, a backup can also be used to mimic a production database for testing an application in a different environment, or even to debug something that can not be done on the production database.

There are various methods of database backups that you can implement, from logical backup using tools that are embedded in the database (eg. mysqldump, mongodump, pg_dump) to physical backup using third party tools (eg. xtrabackup, barman, pgbackrest, mongodb consistent backup).

Which method to use is often determined on how you would like to restore. For instance, assume you dropped a table or a collection by mistake. Unlikely as it might seem, it does happen. So the fastest way to recover would be to restore just that table or collection, instead of having to restore an entire database.

Backup and Restore in Mongodb

Mongodump and mongorestore is the tool for logical backup used in MongoDB, it is kind of mysqldump in MySQL, or pg_dump in PostgreSQL. The mongodump and mongorestore utility will be included when you install MongoDB and it dumps the data in BSON format. Mongodump is used to backup the database logically into dump files, while mongorestore is used for the restore operation.

mongodump and mongorestore commands are easy to use, although there are a lot of options.

As we can see below, you can backup specific databases or collections. You can even take a point in time snapshot by including the oplog.

root@n2:~# mongodump --help

Usage:

  mongodump <options>



Export the content of a running server into .bson files.



Specify a database with -d and a collection with -c to only dump that database or collection.



See http://docs.mongodb.org/manual/reference/program/mongodump/ for more information.



general options:

      --help                                                print usage

      --version                                             print the tool version and exit



verbosity options:

  -v, --verbose=<level>                                     more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)

      --quiet                                               hide all log output



connection options:

  -h, --host=<hostname>                                     mongodb host to connect to (setname/host1,host2 for replica sets)

      --port=<port>                                         server port (can also use --host hostname:port)



kerberos options:

      --gssapiServiceName=<service-name>                    service name to use when authenticating using GSSAPI/Kerberos ('mongodb' by default)

      --gssapiHostName=<host-name>                          hostname to use when authenticating using GSSAPI/Kerberos (remote server's address by default)



ssl options:

      --ssl                                                 connect to a mongod or mongos that has ssl enabled

      --sslCAFile=<filename>                                the .pem file containing the root certificate chain from the certificate authority

      --sslPEMKeyFile=<filename>                            the .pem file containing the certificate and key

      --sslPEMKeyPassword=<password>                        the password to decrypt the sslPEMKeyFile, if necessary

      --sslCRLFile=<filename>                               the .pem file containing the certificate revocation list

      --sslAllowInvalidCertificates                         bypass the validation for server certificates

      --sslAllowInvalidHostnames                            bypass the validation for server name

      --sslFIPSMode                                         use FIPS mode of the installed openssl library



authentication options:

  -u, --username=<username>                                 username for authentication

  -p, --password=<password>                                 password for authentication

      --authenticationDatabase=<database-name>              database that holds the user's credentials

      --authenticationMechanism=<mechanism>                 authentication mechanism to use



namespace options:

  -d, --db=<database-name>                                  database to use

  -c, --collection=<collection-name>                        collection to use



uri options:

      --uri=mongodb-uri                                     mongodb uri connection string



query options:

  -q, --query=                                              query filter, as a JSON string, e.g., '{x:{$gt:1}}'

      --queryFile=                                          path to a file containing a query filter (JSON)

      --readPreference=<string>|<json>                      specify either a preference name or a preference json object

      --forceTableScan                                      force a table scan



output options:

  -o, --out=<directory-path>                                output directory, or '-' for stdout (defaults to 'dump')

      --gzip                                                compress archive our collection output with Gzip

      --repair                                              try to recover documents from damaged data files (not supported by all storage engines)

      --oplog                                               use oplog for taking a point-in-time snapshot

      --archive=<file-path>                                 dump as an archive to the specified path. If flag is specified without a value, archive is written to stdout

      --dumpDbUsersAndRoles                                 dump user and role definitions for the specified database

      --excludeCollection=<collection-name>                 collection to exclude from the dump (may be specified multiple times to exclude additional collections)

      --excludeCollectionsWithPrefix=<collection-prefix>    exclude all collections from the dump that have the given prefix (may be specified multiple times to exclude additional prefixes)

  -j, --numParallelCollections=                             number of collections to dump in parallel (4 by default) (default: 4)

      --viewsAsCollections                                  dump views as normal collections with their produced data, omitting standard collections

There are many options in mongorestore command, mandatory option is related to connection options such as host, port, and authentication. There are other parameters, like -j used to restore collections in parallel, -c or --collection is used for a specific collection, and -d or --db is used to define a specific database. The list of options of the mongorestore parameter can be shown using help :

root@n2:~# mongorestore --help

Usage:

  mongorestore <options> <directory or file to restore>



Restore backups generated with mongodump to a running server.



Specify a database with -d to restore a single database from the target directory,

or use -d and -c to restore a single collection from a single .bson file.



See http://docs.mongodb.org/manual/reference/program/mongorestore/ for more information.



general options:

      --help                                                print usage

      --version                                             print the tool version and exit



verbosity options:

  -v, --verbose=<level>                                     more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)

      --quiet                                               hide all log output



connection options:

  -h, --host=<hostname>                                     mongodb host to connect to (setname/host1,host2 for replica sets)

      --port=<port>                                         server port (can also use --host hostname:port)



kerberos options:

      --gssapiServiceName=<service-name>                    service name to use when authenticating using GSSAPI/Kerberos ('mongodb' by default)

      --gssapiHostName=<host-name>                          hostname to use when authenticating using GSSAPI/Kerberos (remote server's address by default)



ssl options:

      --ssl                                                 connect to a mongod or mongos that has ssl enabled

      --sslCAFile=<filename>                                the .pem file containing the root certificate chain from the certificate authority

      --sslPEMKeyFile=<filename>                            the .pem file containing the certificate and key

      --sslPEMKeyPassword=<password>                        the password to decrypt the sslPEMKeyFile, if necessary

      --sslCRLFile=<filename>                               the .pem file containing the certificate revocation list

      --sslAllowInvalidCertificates                         bypass the validation for server certificates

      --sslAllowInvalidHostnames                            bypass the validation for server name

      --sslFIPSMode                                         use FIPS mode of the installed openssl library



authentication options:

  -u, --username=<username>                                 username for authentication

  -p, --password=<password>                                 password for authentication

      --authenticationDatabase=<database-name>              database that holds the user's credentials

      --authenticationMechanism=<mechanism>                 authentication mechanism to use



uri options:

      --uri=mongodb-uri                                     mongodb uri connection string



namespace options:

  -d, --db=<database-name>                                  database to use when restoring from a BSON file

  -c, --collection=<collection-name>                        collection to use when restoring from a BSON file

      --excludeCollection=<collection-name>                 DEPRECATED; collection to skip over during restore (may be specified multiple times to exclude additional collections)

      --excludeCollectionsWithPrefix=<collection-prefix>    DEPRECATED; collections to skip over during restore that have the given prefix (may be specified multiple times to exclude additional prefixes)

      --nsExclude=<namespace-pattern>                       exclude matching namespaces

      --nsInclude=<namespace-pattern>                       include matching namespaces

      --nsFrom=<namespace-pattern>                          rename matching namespaces, must have matching nsTo

      --nsTo=<namespace-pattern>                            rename matched namespaces, must have matching nsFrom



input options:

      --objcheck                                            validate all objects before inserting

      --oplogReplay                                         replay oplog for point-in-time restore

      --oplogLimit=<seconds>[:ordinal]                      only include oplog entries before the provided Timestamp

      --oplogFile=<filename>                                oplog file to use for replay of oplog

      --archive=<filename>                                  restore dump from the specified archive file. If flag is specified without a value, archive is read from stdin

      --restoreDbUsersAndRoles                              restore user and role definitions for the given database

      --dir=<directory-name>                                input directory, use '-' for stdin

      --gzip                                                decompress gzipped input



restore options:

      --drop                                                drop each collection before import

      --dryRun                                              view summary without importing anything. recommended with verbosity

      --writeConcern=<write-concern>                        write concern options e.g. --writeConcern majority, --writeConcern '{w: 3, wtimeout: 500, fsync: true, j: true}'

      --noIndexRestore                                      don't restore indexes

      --noOptionsRestore                                    don't restore collection options

      --keepIndexVersion                                    don't update index version

      --maintainInsertionOrder                              preserve order of documents during restoration

  -j, --numParallelCollections=                             number of collections to restore in parallel (4 by default) (default: 4)

      --numInsertionWorkersPerCollection=                   number of insert operations to run concurrently per collection (1 by default) (default: 1)

      --stopOnError                                         stop restoring if an error is encountered on insert (off by default)

      --bypassDocumentValidation                            bypass document validation

      --preserveUUID                                        preserve original collection UUIDs (off by default, requires drop)

Restoring specific collections in MongoDB can be done using the parameter collection in mongorestore. Assume you have an orders database, inside the orders database there are some collections as shown below:

my_mongodb_0:PRIMARY> show dbs;

admin   0.000GB

config  0.000GB

local   0.000GB

orders  0.000GB

my_mongodb_0:PRIMARY> use orders;

my_mongodb_0:PRIMARY> show collections;

order_details

orders

stock

We already have scheduled a backup for the orders database, and we want to restore the stock collection into a new database order_new in the same server. If you want to use option --collection, you need to pass the collection name as parameter of mongorestore or you can use the option --nsInclude={db}.{collection} if you didn’t specify the path to the collection file.

root@n2:~/dump/orders# mongorestore -umongoadmin --authenticationDatabase admin --db order_new --collection stock /root/dump/orders/stock.bson

Enter password:

2020-03-09T04:06:29.100+0000 checking for collection data in /root/dump/orders/stock.bson

2020-03-09T04:06:29.110+0000 reading metadata for order_new.stock from /root/dump/orders/stock.metadata.json

2020-03-09T04:06:29.134+0000 restoring order_new.stock from /root/dump/orders/stock.bson

2020-03-09T04:06:29.202+0000 no indexes to restore

2020-03-09T04:06:29.203+0000 finished restoring order_new.stock (1 document)

2020-03-09T04:06:29.203+0000 done

You can check the collection in order_new database as shown below :

my_mongodb_0:PRIMARY> use order_new;

switched to db order_new

my_mongodb_0:PRIMARY> show collections;

stock

How We Can Restore Using mongodump in ClusterControl

Restoring a backup dump through ClusterControl is easy, you just need 2 steps to restore the backup. There will be lots of backup files in the list if you enabled your backup schedule, there is some information about the backups that can be very useful. For example, status of backup which indicates if the backup was completed / failed, method of backup had taken, list of databases, and size of dump. The steps to restore MongoDB data via ClusterControl are as below :

Step One

Follow the prompts to restore backup to a node as shown below...

Step Two

You need to choose which backup that needs to be restored.

Step Three

Review the summary...

Tags:

Previously, we wrote about Setting Up a Geo-Distributed Database Cluster Using MySQL Replication. This time, it's about PostgreSQL. Setting up a geo-distributed cluster for PostgreSQL is not a new concept and the topology is quite common.

To achieve high availability, organizations and companies are dispersing their database nodes so that when a catastrophic event happens in a specific region (which affects your data center) you have your standby nodes available for failover.

This is a very common practice (using this type of topology) as part of your organization's Business Continuity and Disaster Recovery plans. This type of topology removes having a single point of failure (SPOF). A common requirement especially if you have a low RPO and a higher uptime (if possible at 99.999999999%).

In this blog, I'll take a simple implementation on how to do this using ClusterControl. ClusterControl is an agentless management and automation software for database clusters. It helps deploy, monitor, manage, and scale your database server/cluster directly from ClusterControl user interface.

The Desired Architectural Setup

The target outcome here is to deploy efficiently in a secure environment. To do this, it's important that you need to place your established connection in between using VPN and would be more secure if you also setup your database nodes over TLS/SSL connection. For this setup in our blog, we simply deploy a node over a VPN and showcase to you how you can easily do this approach. See below for the diagram of the target setup:

To elaborate the setup, the on-premise network shall communicate over the public cloud using a VPN tunnel and both of these networks shall have a VPN gateway so both can communicate or establish a connection. ClusterControl requires you to oversee all the nodes that have to be registered as it will collect information about your nodes for data metrics. Aside from that, it requires that your on-prem active-writer node can also reach the standby node into the other domain, which is for this blog, hosted in Google Cloud Platform (GCP).

Setting Up Your OpenVPN

OpenVPN setup is very tricky for both network domains. The gist of this is that, it must have the following consideration:

Nodes from your on-prem shall be able to establish a connection to the target public cloud domain nodes
Nodes from your on-prem can be able to have internet access to download packages that are required to set up. Unless you have all the repositories stored locally that are required, this can be not the case
Nodes from your public cloud domain shall be able to establish a connection to the on-premise nodes
Nodes from your public cloud domain can be able to have internet access to download packages that are required to set up. Unless you have all the repositories stored locally that are required, this can be not the case

OpenVPN Installation and Configuration

Step One

Install the openvpn package (and easy-rsa packages for Ubuntu/Debian distros)

$ sudo apt-get install openvpn easy-rsa

For CentOS/RHEL based OS,

$ sudo yum install openvpn wget

$ wget -O /tmp/easyrsa https://github.com/OpenVPN/easy-rsa-old/archive/2.3.3.tar.gz

Step Two

Generate your certificates such as certification authority (CA), server, and client certificates.

For Ubuntu/Debian, you ca do the following actions:

$ /usr/bin/make-cadir CA

Change to CA directory

$ cd CA

At this point, you might likely to edit vars file in accordance to your needs, e.g.

export KEY_COUNTRY="SE"

export KEY_PROVINCE="SMD"

export KEY_CITY="Kalmar"

export KEY_ORG="Severalnines"

export KEY_EMAIL="paul@s9s.io"

export KEY_CN="S9s"

export KEY_NAME="server"

export KEY_OU="Support Unit"

Then execute the vars script to define the required env variables

[ ~/CA ]$ source ./vars

NOTE: If you run ./clean-all, I will be doing a rm -rf on /CA/keys

Run a clean-up

[ ~/CA ]$ ./clean-all

Then build the certificates for your CA, server, and client.

[ ~/CA ]$ ./build-ca

[ ~/CA ]$ ./build-key-server server

 $ ./build-dh 2048

[ ~/CA ]$ ./build-key client

Lastly, generate a Perfect Forward Secrecy key.

$ openvpn --genkey --secret pfs.key

If you're using CentOS/RHEL type distros, you can do the following:

$ tar xfz /tmp/easyrsa

$ sudo mkdir /etc/openvpn/easy-rsa

$ sudo cp -rf easy-rsa-old-2.3.3/easy-rsa/2.0/* /etc/openvpn/easy-rsa

# Ensure your RSA keys are on the right permission for security purposes

$ sudo chown vagrant /etc/openvpn/easy-rsa/

$ sudo cp /usr/share/doc/openvpn-2.4.4/sample/sample-config-files/server.conf /etc/openvpn

$ sudo mkdir /etc/openvpn/easy-rsa/keys

$ sudo nano /etc/openvpn/easy-rsa/vars

$ cd /etc/openvpn/easy-rsa

At this point, you might likely to edit vars file in accordance to your needs, e.g.

export KEY_COUNTRY="SE"

export KEY_PROVINCE="SMD"

export KEY_CITY="Kalmar"

export KEY_ORG="Severalnines"

export KEY_EMAIL="paul@s9s.io"

export KEY_CN="S9s"

export KEY_NAME="server"

export KEY_OU="Support Unit"

Then execute the vars script to define the required env variables

$ source ./vars

NOTE: If you run ./clean-all, I will be doing a rm -rf on /CA/keys

Run a clean-up

$ ./clean-all

Then build the certificates for your CA, server, and client.

$ ./build-ca

$ ./build-key-server server

$ ./build-dh 2048

$ cd /etc/openvpn/easy-rsa

$ ./build-key client

$ cp /etc/openvpn/easy-rsa/openssl-1.0.0.cnf /etc/openvpn/easy-rsa/openssl.cnf

Once you have all setup, you must take into account where your keys and certificates are in place. If you're using systemd or service in Linux to run this, then you might place your certificates and keys to /etc/openvpn. Likely, you might have to run the following command:

sudo cp dh2048.pem ca.crt server.crt server.key /etc/openvpn

Step Three

At this point, I end up with the following server and client configuration. See my configuration files accordingly,

OpenVPN Server Config

$ cat /etc/openvpn/server-ovpn.conf 

port 1194

proto udp

dev tun

ca /etc/openvpn/keys/ca.crt

cert /etc/openvpn/keys/server.crt

key /etc/openvpn/keys/server.key # This file should be kept secret

dh /etc/openvpn/keys/dh2048.pem

cipher AES-256-CBC

auth SHA512

server 10.8.0.0 255.255.255.0

client-to-client

topology subnet

push "route 192.168.30.0 255.255.255.0"

#push "redirect-gateway def1 bypass-dhcp"

#push "redirect-gateway"

push "dhcp-option DNS 8.8.8.8"

push "dhcp-option DNS 8.8.4.4"

ifconfig-pool-persist ipp.txt

keepalive 10 120

comp-lzo

persist-key

persist-tun

#status openvpn-status.log

#log-append  openvpn.log

verb 3

tls-server

tls-auth /etc/openvpn/keys/pfs.key

The most important thing you need to take into account are these following options as noted as follows.

client-to-client - Very important so nodes in the VPN can ping the other nodes in different network domain. Say, ClusterControl is located in on-prem, it can ping the nodes in GCP.

push "route 192.168.30.0 255.255.255.0"- I push the routing tables so that GCP node/s connected to VPN can ping my nodes in the on-premise domain. In my GCP VPN gateway, I have the following routing tables as push "route 10.142.0.0 255.255.255.0"

#push "redirect-gateway def1 bypass-dhcp" ,

#push "redirect-gateway"- Both these sections are not required since I need internet connection for both to setup my repo and dependent packages upon installation.

push "dhcp-option DNS 8.8.8.8",

push "dhcp-option DNS 8.8.4.4"- Both these sections can be changed to your desired DNS if needed. This is for your desired DNS especially when you need internet connection.

OpenVPN Client Config

$ cat openvpn/client-vpn.ovpn 

client

dev tun

proto udp

remote 34.73.238.239  1194  

ca ca.crt

cert client.crt

key client.key

tls-version-min 1.2

tls-cipher TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256:TLS-ECDHE-ECDSA-WITH-AES-128-GCM-SHA256:TLS-ECDHE-RSA-WITH-AES-256-GCM-SHA384:TLS-DHE-RSA-WITH-AES-256-CBC-SHA256

cipher AES-256-CBC

auth SHA512

resolv-retry infinite

auth-retry none

nobind

persist-key

persist-tun

ns-cert-type server

comp-lzo

verb 3

tls-client

tls-auth pfs.key

The most important thing here is you need to be sure of your key paths, and also replace the params in this section,

remote 34.73.238.239  1194

which can be the hostname/IP address of your VPN server gateway to connect to.

Step Four

Lastly, setup proxy VPN server to have the network packets routed to the network interface on the server and allow the kernel to forward IPV4 traffic

sudo iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE

sudo echo 1 > /proc/sys/net/ipv4/ip_forward

For more in-depth installation, I suggest you look at these posts for CentOS and on Ubuntu.

Extending Over the Cloud Efficiently

Suppose you have the following topology in your on-prem domain,

and you now want to extend your availability over another datacenter, which is GCP for this blog. Deploying using ClusterControl is very straightforward. You can do the following procedure stated below,

Step One

Create A Slave Cluster

Step Two

Choose your master for replication,

Step Three

Setup the access to your public cloud environment

Step Four

Specify the hostname/IP of your node to be extended into your PG replication cluster,

Step Five

Lastly, monitor the job activity how does ClusterControl reacts to this type of action

The outcome will show you the linkage between your on-prem and your extended datacenter, which is in this blog, our GCP PostgreSQL standby node. See below for the result

Conclusion

Setting up a geo-location standby node is not difficult, but the main issue is how secure will this be in your architectural design. Using a VPN can alleviate the main concern of the problem. Using OpenVPN is just a simple way to implement this but for heavy transactional applications, organizations are likely to invest on upscale services or hardware to deal with this setup. Also by adding a TLS/SSL can be easier than done. We'll discuss this on how you can use TLS/SSL with PostgreSQL in our next blogs.

Tags:

Modern IT needs to have a non-relational, dynamic schema (meaning no requirements for Joins Statements Queries) to provide support for Big Data/real-time applications. NoSQL databases were created with the notion of improving data processing performance and tackling the ability to scale out to overcome distributed database load using the concept of multiple hosts have won the new generation demand for data processing.

Besides providing the essential support for various data models and scripting languages, MongoDB also allows easy to start with the process for the developers.

NoSQL database open the doors to...

Text-based protocols using a scripting language (REST and, JSON, BSON)
Indeed minimal cost to generate, store and transport data
Support huge amounts of data processing.
Increased write performance
Not required to perform object-relational mapping and normalization process
No rigid controls with referential integrity rules
Reducing maintenance cost with database administrators
Lowering expansion cost
Fast key-value access
Advancing the support for machine learning and intelligence

MongoDB Market Acceptance

The modern needs for Big Data Analytics and modern applications play a crucial role in the need to improve the lifecycle of data processing, with no expectations for hardware expansion and cost increase.

If you are planning for a new application, and you want to choose a database, arriving at the right decision with many database options in the market can be a complicated process.

The DB-engine popularity ranking shows that MongoDB stands at no.1 compared to Oracle NoSQL (which placed at No. 74). The trend, however, is indicating that something is changing. The need for many cost-effective expansions goes hand in hand with much simpler data modelling, and administration is transforming how developers would want to consider the best for their systems.

According to Datanyze market share information to-date there are about 289 websites that are running on Oracle Nosql with a market share of 11%, where else MongoDB has a complete 12,185 website with a market share of 4.66%. These impressive numbers indicate that there is a bright future for MongoDB.

NoSQL Data Modeling

Data modelling requires understanding of...

The types of your current data.
What are the types of data that you are expecting in the future?
How is your application gaining access to required data from the system?
How is your application going to fetch required data for processing?

The exciting thing for those who have always followed the Oracle way of creating schemas, then storing the data, MongoDB allows creating the collection along with the document. This means the creation of collections is not a must-have to exist before the document creation takes place, making MongoDB much appreciated for its flexibility.

In Oracle NoSQL, however, the table definition has to be created first, after which you can continue to create the rows.

The next cool thing is that MongoDB does not imply strict rules on schema and relations implementation, which gives you the freedom for continuous improvement of the system without fearing much on the need to ensure tight schematic design.

Let’s look at some of the comparisons between MongoDB and Oracle NoSQL.

Comparing NoSQL Concepts in MongoDB and Oracle

NoSQL Terminologies

MongoDB	Oracle NoSQL	Facts
Collection	Table / View	Collection / table act as the storage container; they are similar but not identical.
Document	Row	For MongoDB, data stored in a collection, in the form of documents and Fields. For Oracle NoSQL, a table is a collection of rows, where each row holds a data record. Each table row consists of key and data fields, which are defined when a table is created.
Field	Column
Index	Index	Both databases use an index to improve the speed of search carried out in the database.

Document Store and Key-Value Store

Oracle NoSQL provides a storage system that stores values indexed by a key; this concept is viewed as the least complex model as the datasets consist of an indexed key-value. The records organised using major keys and minor keys.

The major key can be viewed as the object pointer and the minor key as the fields in the record. Efficient Search for the data is made possible with the use of the key as the mechanism to access the data just like a Primary key.

MongoDB extends key-value pairs. Each document has a unique key, which serves the purpose to retrieve the document. Documents are known as dynamic schema, as the collections in a document do not need to have the same set of fields. A collection can have a common field with different types of data. These attributes lead the document data model to map directly to support the modern object-oriented languages.

MongoDB

Oracle NoSQL

Document store

Example:

Key-value store

Example:

BSON and JSON

Oracle NoSQL uses JSON as a standard data format to transmit (data + attribute-value pairs). On the other hand MongoDB uses BSON.

MongoDB	Oracle NoSQL
BSON	JSON
Binary JSON - binary data format - induces faster processing	Javascript Object Notation - standard format. Much slower processing compared to BSON.
Characteristics : BSON is not human-readable format Lightweight Traversable Efficient. Additional datatype: BinData and Date datatypes	Characteristics: Human-readable and writable format Lightweight Skim through all the content Text-based data interchange format Language independent.

BSON is not in a human-readable text, unlike JSON. BSON stands for binary-encoded serialization of JSON like data, mainly used for data storage and a transfer format with MongoDB. BSON data format consists of a list of ordered elements containing a field name (string), type, and value. As for the data types BSON supports, all the datatypes commonly found in JSON and includes two additional datatypes (Binary Data and Date). Binary data or known as BinData that is less than 16MB can be stored directly into MongoDB documents. BSON is said to be consuming more space than JSON data documents.

There are two reasons why MongoDB consumes more space as compared to Oracle NoSQL:

MongoDB achieved the objective of being able to traverse fast, enabling the option to traverse fast requires the BSON document to carry additional metadata (length of string and subobjects).
BSON design can encode and decode fast. For example, integers are stored as 32 (or 64) bit integers, to eliminate parsing to and from the text. This process uses more space than JSON for small integers but is much faster to parse.

Data Model Definition

MongoDB Collection Statement

Create a collection

db.createCollection("users")

Creating a collection with an automatic _id

db.users.insert

( {
    User_id: "U1",
    First_name: "Mary"                  
    Last_name : "Winslet",  

    Age       : 15

    Contact   : {

               Phone: "123-456-789"

               Email: "mary@example.com"  

                }

   access  : {

              Level:5,

              Group:"dev"

             }            

})

MongoDB allows the related pieces of information in the same database record to be embedded. Data model Design

Oracle NoSQL Table Statement

Using SQL CLI to setup namespace:

Create namespace newns1;

Using namespace to associate tables and child table

news1:users

News1:users.access

Create Table with an IDENTITY using:

Create table newns1.user (

idValue INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), 

User_id String,

First_name String,

Last_name String, 

Contact Record (Phone string,         

                Email string),

Primary key (idValue));

Create Table using SQL JSON:

Create table newns1.user (

idValue INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000),

User_profile JSON, 

Primary Key (shard(idValue),User_id));

Rows for User table: type JSON

{

  "id":U1,

  "User_profile" : {

     "First_name":"Mary",

     "Lastname":"Winslet",

     "Age":15,

     "Contact":{"Phone":"123-456-789",                   

     "Email":"mary@example.com"

                   }

}

Based on the Data Definitions above, MongoDB allows different methods for schema creation. Collection can be defined explicitly or during the first insert of the data into the document. When creating a collection, you can define an objectid. Objectid is the primary key for MongoDB documents. Objectid is a 12-byte binary BSON type that contains 12 bytes generated by MongoDB drivers and the server using a default algorithm. MongoDB objectid is useful and serves the purpose to sort the document created in a specific collection.

Oracle NoSQL does have several ways to start defining tables. If you are using the Oracle SQL CLI by default, new table creation will be placed in sysdefault until you decide to create a new namespace to associate a set of new tables with it. The above example demonstrates the new namespace “ns1” created, and the user table is associated with the new namespace.

Besides identifying the primary key, Oracle NoSQL also uses the IDENTITY column to auto-increment a value each time you add a row. The IDENTITY value is auto-generated and must be an Integer, long or number data type. In Oracle NoSQL, IDENTITY associates with the Sequence Generator similar to the concept of objectid with MongoDB. As Oracle NoSQL allows the IDENTITY key to be used as the primary key. If you are considering IDENTITY key as the primary key, this is where careful consideration is required as it may have an impact over the insertion of data and the update process takes place.

MongoDB and Oracle NoSQL table/collection level definition show how the ‘contact’ information is embedded into the same single structure without requiring additional schema definition. The benefit of embedding a dataset is that no further queries would be necessary to retrieve the embedded dataset.

If you are looking to maintain your system in a simple form, MongoDB provides the best option to retain the data documents with less complication. At the same time, MongoDB provides the capabilities to deliver the existing complex data model from relational schema using schema validation tool.

Oracle NoSQL provides the capabilities to use SQL, like query language with DDL and DML, which requires much less effort for users who have some experience with the use of Relation Database systems.

MongoDB shell uses Javascript, and if you are not comfortable with the language or with the use of mongo shell, then the best fit for the process is to opt to use an IDE tool. The top 5 MongoDB IDEtools in 2020, like studio 3T, Robo 3T, NoSQLBooster, MongoDB Compass and Nucleon Database Master will be helpful to assist you in creating and managing complex queries with the use of aggregation features.

Performance and Availability

As the MongoDB data structure model uses documents and collections, using BSON data format for processing a huge amount of data becomes much faster compared to Oracle NoSQL. While some consider querying data with SQL is a more comfortable pathway for many users, the capacity becomes an issue. When we have a huge amount of data to support, the need for increased throughput and followed by the use of SQL to Design complex queries, these processes are asking us to relook at the server capacity and cost increase over time.

Both MongoDB and Oracle NoSQL provide sharding and replication features. Sharding is a process that allows the dataset and the overall processing load to be distributed across multiple physical partitions to increase processing (read/write) speed. The implementation of shard with oracle requires you to have prior information on how sharding keys work. The reason behind the pre-planning process is due to the need of having to implement the shard key at the schema initiation level.

The implementation of shard with MongoDB gives the room for you to work on your dataset first to identify the potential right shard key based on query patterns before implementation. As the sharding process includes data replication, MongoDB has a reputation for fast data replication as well. Replication takes care of fault tolerance of having to have all data in a single server.

Conclusion

What makes MongoDB preferred over Oracle NoSQL is that it is in binary format and its inborn characteristics of lightweight, traversable, and efficient. This allows you to support the advancing modern application in the area of machine learning and artificial intelligence.

MongoDB characteristics enable the developers to work much more confidently to build modern applications faster. The MongoDB data model allows the processing of huge amounts of unstructured data with an improved speed that is well thought off compared to Oracle NoSQL. Oracle NoSQL wins when it comes to tools it has to offer and possible options to create data models. However, it is essential to make sure developers and designers can learn and adapt to technology fast, which is not the case with Oracle NoSQL.

Tags: