How to reindex Solr when running on Kubernetes

If you need to run a full reindex in Solr, you’ll probably try the classic approach that requires to stop Solr as a first step, executing the following command

/opt/alfresco-search-services/solr/bin/solr stop

or adding the port to the command (we’re using 8983), in case you get the message “No Solr nodes found to stop.”

opt/alfresco-search-services/solr/bin/solr stop -p 8983

Immediately, right after that, the pod will get restarted and you’ll get kick out, without any chance to execute the steps required to complete the classic approach. If that’s your case, there is another way to run a full reindex following these steps:

1. Access Alfresco Search Service container

2. Unload alfresco and archive cores

Once on the container, execute the following commands to unload the cores:

curl 'http://localhost:8983/solr/admin/cores?action=UNLOAD&core=alfresco&deleteIndex=true'
curl 'http://localhost:8983/solr/admin/cores?action=UNLOAD&core=archive&deleteIndex=true'

3. Check the cores are unloaded

Execute the following command to verify that no cores exist:
curl 'http://localhost:8983/solr/admin/cores?action=STATUS'

Output should look similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <lst name="initFailures"/>
    <lst name="status"/>
</response>

or
curl 'http://localhost:8983/solr/admin/cores?action=SUMMARY'

Output should look similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">400</int>
        <int name="QTime">14</int>
    </lst>
    <lst name="error">
        <lst name="metadata">
            <str name="error-class">org.apache.solr.common.SolrException</str>
            <str name="root-error-class">java.lang.NullPointerException</str>
        </lst>
        <str name="msg">Error executing implementation of admin request SUMMARY</str>
        <int name="code">400</int>
    </lst>
</response>

or verify cores folders at
/opt/alfresco-search-services/data

4. Remove alfrescoModels folder

Execute the following command to remove the content models stores by Solr. Only required if a non incremental change has been made on the models. 

rm -fr /opt/alfresco-search-services/data/alfrescoModels

5. Restart Alfresco Search Service container and recreate the cores

To recreate the cores delete the Alfresco Search Services deployment on k8s and redeploy it, or execute the following command:
/opt/alfresco-search-services/solr/bin/solr restart -a "-Dcreate.alfresco.defaults=alfresco,archive"

6. Check Solr report for missing nodes on the index

Execute the following command:
curl 'http://localhost:8983/solr/admin/cores?action=REPORT&wt=xml'

7. Check indexing progress

Execute the following command to verify the number of documents indexed:
curl 'http://localhost:8983/solr/admin/cores?action=STATUS&core=alfresco'

In case you need to monitor/record the indexing performance, you can run the following code and check the index-rate log file.

 (
 numDocs1=0;
 echo "Starting log" >> /opt/alfresco-search-services/data/index-rate.log;
 while sleep 60;
 do numDocs2=$(curl -s 'http://localhost:8983/solr/admin/cores?action=STATUS&core=alfresco' | grep -oP '(?<=<int name="numDocs">).*?(?=</int>)');
 currentDate=$(date);
 diff=$((numDocs2 - numDocs1));
 echo "$currentDate, Total docs indexed: $numDocs2, rate: $diff docs/min" >> /opt/alfresco-search-services/data/index-rate.log;
 numDocs1=$numDocs2;
 done;
 ) & disown

tail -f /opt/alfresco-search-services/data/index-rate.log

8. Fix missing nodes

If there are missing nodes, execute the following command:
curl 'http://localhost:8983/solr/admin/cores?action=fix&dryRun=false&wt=json&core=alfresco&maxScheduledTransactions=1'

Recommendations

In some cases, to reduce the time and avoid gaps during reindex, it would be good to increase the Solr heap and/or remove readiness and liveness probes.

Increase SOLR_HEAP

Edit SOLR_HEAP variable on file: helm/alfresco-content-services/charts/alfresco-search/templates/config.yaml, using the value alfresco-search.environment.heap

data:
# Increase heap to perform full reindex
SOLR_HEAP: "{{ .Values.environment.heap }}"

Remove readiness and liveness probe

(to avoid false positives due to connection refuse or connection timeout)

Remove probes from file: helm/alfresco-content-services/charts/alfresco-search/templates/deployment.yaml