Exadata Cloud at Customer – number of active CPUs and adding a new database
Let’s imagine a typical working day, and you are getting a request to add a new database to your Exadata Cloud at Customer (ExaCC). If you are not familiar with the product, you can read about it in detail here. In short, it is an Exadata machine with a cloud interface, something like Oracle Exadata Cloud Service, but with the hardware installed in your datacenter.
You carry on with the request, fill up all the information on the database creation page, and push the button “Create Database”.
Everything looks correct, and you’ve probably done it several times already and don’t expect any surprises. But somehow, after a while, you are getting notification that your request has failed, and the database was not created.
When it happened the first time to me, I was disappointed by the lack of any useful information in OCI console and logs. I was expecting a bit more than just “failed due to an unknown error”.
Here is starting a quick troubleshooting part. I went to the logs on the VM cluster and, after some research, found the real reason why the request had failed.
[root@vorade1 ~]# view /var/opt/oracle/ocde/ocde_createdb_CDBDEV.out ERROR : rac: _is_cpu_count_ok, cpu_count 4 is not enough for running_dbs_count 10. Please increase the number of cpus.At a minimum we should be one vcpu per two DBs. ERROR: OCDE createdb pre-reqs failed, please check logs corereg: secure: Wallet location is not defined, securing corereg means losing all credentials. INFO: corereg: secure: Removed all entries matching passwd or decrypt_key from corereg file /var/opt/oracle/creg/CDBDEV.ini INFO: Total time taken by ocde is 7 seconds OCDE failed with message: ERROR: OCDE createdb pre-reqs failed, please check logs INFO : ocde_time_format is 2020/03/25 11:38:24 OCDE failed with message: ERROR: OCDE createdb pre-reqs failed, please check logs #### Completed OCDE with errors, please check logs ####
Going a little bit deeper to the Oracle ExaCC tools, we can find the Perl module “rac.pm” and the “_is_cpu_count_ok()” function there. That function compares the number of CPU to the twice of the number of running database instances. As a result, we have the hardcoded limit of available container databases on the ExaCC, which is bound to twice of number of OCPU for the VM cluster.
We can apply two workarounds for the issue. The first is to scale up the number of OCPU on the ExaCC. You don’t have to keep them all the time. You can burst it up only for the database creation and then scale down to the original number.
The second workaround is to shut down one of the existing databases for the duration of the creation of a new one. I don’t think it is the best course of action, but it might be acceptable if you know that some databases can be stopped at certain times.
The summary is short. We have a hardcoded limit of 2*OCPU for the databases, which can be solved in a couple of ways. And the level of logging in the OCI interface is not adequate, and you need to dig into the logs by yourself or create an Oracle support SR to get the real cause of the error.