To minimize the risk of data loss should there be a failure in the data center that hosts meshIQ Data Services SolrCloud clusters, you must periodically copy the data in Solr to a separate data center. This can be accomplished by using the Backup and Restore features of the Data Services Database Maintenance Utility. (See the Data Services Database Maintenance Utility Guide for details.) This command-line utility will need to be run periodically to take backups of the necessary Data Services Solr collections.
By default, the utility will back up all non-reference Data Services Solr collections. (Reference Solr collections are those whose names start with “jkoolref.” There’s no need to back these up, as they are initialized as part of installation setup and are not modified by Data Services.) Rather than running the backup utility using the default setting, however, it is recommended that you configure it to back up only specific collections. This approach allows multiple instances of the utility to be used to back up specific groups of Solr collections at different intervals, depending on your requirements.
There are three classes of Data Services Solr collections (in addition to the reference ones):
- Administration collections (those whose collection name starts with “jkooladmin”).
- Definition collections, which hold object definitions, like Sets and Triggers. These collections are updated as a result of user interactions.
- Streaming-data collections, which hold records generated from processing streamed data. These collections include:
- activities
- datasets
- events
- logs
- relationships
- resources
- snapshots
- sources
Although the jkool.logs collection is not updated as a direct result of streaming, much of the activity that occurs in Data Services is logged here, and it can grow quite large. So for backup purposes, it should be included with streaming-data collections, if needed.
At any given time, there must be only one active backup running per collection. Keep in mind that backups of larger collections may take some time. Therefore, the decision concerning how often to run the backups is a tradeoff between how much data loss you’re willing to accept in the event of a disaster and how long the backups will take. The general recommendation is to define two separate CRON jobs (see About CRON below) that are run at different times. Since they will share a common log file, it is preferable that you set up these times so that their executions do not overlap:
- One for the administration and definition collections that would run the backups synchronously, twice daily. The Database Maintenance Utility would be run as follows:
jkool-db-maint.sh -backup -src:http://<solr-host>:8983 -f:/XrayBackups -name:XrayProd -tables:jkooladmin.registeredusers,jkooladmin.organization,jkooladmin.repositories,jkooladmin.accesstokens,jkooladmin.volumes,jkool.actions,jkool.dictionaries,jkool.inputdatarules,jkool.mlmodel,jkool.providers,jkool.scripts,jkool.sets,jkool.triggers,jkool.views,jkool.viewtemplates
- One for the streaming-data collections that would run the backups asynchronously, once weekly, preferably during a time of low streaming volume. The Database Maintenance Utility would be run as follows:
jkool-db-maint.sh -backup -src:http://<solr-host>:8983 -f:/XrayBackups -name:XrayProd -async -tables:jkool.activities,jkool.datasets,jkool.events,jkool.relationships,jkool.resources,jkool.snapshots,jkool.sources
The above examples omit the following Solr collections, since this information is not considered critical:
- jkool.jobs
- jkool.logs
- jkooladmin.quotausage
If maintaining this information is important for your situation, you can include them along with the streaming-data collections (example 2 above), since these tables have the potential to grow quite large.
The synchronous backups in example 1 above will log the success or failure of the backup to the log file. The asynchronous ones only log that they are started, so the log files do not indicate whether they were successful. You must monitor the status of these manually, using jkool-db-maint.sh -status
. See the Data Services Database Maintenance Utility Guide for details.
About CRON
CRON is the process that runs scheduled jobs. These jobs are defined in a file named “crontab.” Refer to the links below for more information.
Syntax: crontab Man Page - Linux - SS64.com
Ubuntu: How do I set up a Cron job? - Ask Ubuntu
Redhat: Automate your Linux system tasks with cron | Enable Sysadmin (redhat.com)