RabbitMQ can't be started after being stopped incorrectly – Virtuozzo IaaS (OnApp) Support

We moved this page to our Documentation Portal. You can find the latest updates here.

Issue

Rabbitmq-server can't be started with the following error:

[root@cp rabbitmq]# /etc/init.d/rabbitmq-server status
Status of node rabbit@cp ...
Error: unable to connect to node rabbit@cp: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@cp]

rabbit@cp:
  * connected to epmd (port 4369) on cp
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on cp
  * suggestion: start the node

current node details:
- node name: 'rabbitmq-cli-52@cp'
- home dir: /var/lib/rabbitmq
- cookie hash: oOKvDyZvvRIrslg6Hbd8sQ==

Environment

Starting from OnApp v. 4.2

Resolution

The root cause can be checked in default rabbitmq logging directory (/var/log/rabbitmq), specifically in the "rabbit@cp.log" file. The following two examples are the most widespread root causes, which can be easily checked and fixed:

The following output means that rabbitmq queues have been corrupted:

[root@cp ~]# tail -n 25 /var/log/rabbitmq/rabbit\@cp.log
=INFO REPORT==== 4-Jan-2017::22:57:24 ===
Error description:
   {could_not_start,rabbit,
       {{badmatch,
            {error,
                {{{{case_clause,undefined},
                   [{rabbit_queue_index,add_segment_relseq_entry,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,1091}]},
                    {rabbit_queue_index,parse_segment_entries,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,1075}]},
                    {rabbit_queue_index,'-recover_journal/1-fun-0-',1,
                        [{file,"src/rabbit_queue_index.erl"},{line,863}]},
                    {lists,map,2,[{file,"lists.erl"},{line,1239}]},
                    {rabbit_queue_index,segment_map,2,
                        [{file,"src/rabbit_queue_index.erl"},{line,989}]},
                    {rabbit_queue_index,recover_journal,1,
                        [{file,"src/rabbit_queue_index.erl"},{line,856}]},
                    {rabbit_queue_index,scan_segments,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,676}]},
                    {rabbit_queue_index,queue_index_walker_reader,2,
                        [{file,"src/rabbit_queue_index.erl"},{line,664}]}]},
                  {gen_server2,call,[<0.288.0>,out,infinity]}},

Issue can be resolved by moving all available queues off of the working rabbitmq directory:

[root@cp ~]# ls /var/lib/rabbitmq/mnesia/rabbit@cp/queues
17BD92SOBZHLASJ4UE5VT5EHQ 3UM4MIP6TJ0PEAHMBSCNUX731 621MT7APTTVO66OSRLB1R7FY5 9R728VE21EGPVSMK9AJ7YH67E A4SL081HP0B21UALPRHLK069R ACJVPWBZHI24WLV6QHBPNT5JE ETHYXYWKQVKD3Q2EYD84UWIS2
[root@cp ~]# mkdir -p /tmp/badrabbit/; mv /var/lib/rabbitmq/mnesia/rabbit@cp/queues/* /tmp/badrabbit/;

2. And this output means that dets table file has been corrupted:


[root@cp ~]# tail -n 15 /var/log/rabbitmq/rabbit\@cp.log
=INFO REPORT==== 16-Jan-2017::09:11:03 ===
Error description:
   {could_not_start,rabbit,
       {{badmatch,
            {error,
                {{{badmatch,
                      {error,
                          {not_a_dets_file,
                              "/var/lib/rabbitmq/mnesia/rabbit@cp/recovery.dets"}}},
                  [{rabbit_recovery_terms,open_table,0,
                       [{file,"src/rabbit_recovery_terms.erl"},{line,126}]},
                   {rabbit_recovery_terms,init,1,
                       [{file,"src/rabbit_recovery_terms.erl"},{line,107}]},
                   {gen_server,init_it,6,[{file,"gen_server.erl"},{line,328}]},

Issue can be resolved by moving dets file off of the working rabbitmq directory:

[root@cp ~]# mkdir -p /tmp/badrabbit/; mv /var/lib/rabbitmq/mnesia/rabbit@cp/recovery.dets /tmp/badrabbit/;

Cause

This issue can happen on power outages or just because a Control Panel server was restarted by Reset button.