metrics-howto.rst 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285
  1. How to monitor Synapse metrics using Prometheus
  2. ===============================================
  3. 1. Install Prometheus:
  4. Follow instructions at http://prometheus.io/docs/introduction/install/
  5. 2. Enable Synapse metrics:
  6. There are two methods of enabling metrics in Synapse.
  7. The first serves the metrics as a part of the usual web server and can be
  8. enabled by adding the "metrics" resource to the existing listener as such::
  9. resources:
  10. - names:
  11. - client
  12. - metrics
  13. This provides a simple way of adding metrics to your Synapse installation,
  14. and serves under ``/_synapse/metrics``. If you do not wish your metrics be
  15. publicly exposed, you will need to either filter it out at your load
  16. balancer, or use the second method.
  17. The second method runs the metrics server on a different port, in a
  18. different thread to Synapse. This can make it more resilient to heavy load
  19. meaning metrics cannot be retrieved, and can be exposed to just internal
  20. networks easier. The served metrics are available over HTTP only, and will
  21. be available at ``/``.
  22. Add a new listener to homeserver.yaml::
  23. listeners:
  24. - type: metrics
  25. port: 9000
  26. bind_addresses:
  27. - '0.0.0.0'
  28. For both options, you will need to ensure that ``enable_metrics`` is set to
  29. ``True``.
  30. Restart Synapse.
  31. 3. Add a Prometheus target for Synapse.
  32. It needs to set the ``metrics_path`` to a non-default value (under ``scrape_configs``)::
  33. - job_name: "synapse"
  34. metrics_path: "/_synapse/metrics"
  35. static_configs:
  36. - targets: ["my.server.here:port"]
  37. where ``my.server.here`` is the IP address of Synapse, and ``port`` is the listener port
  38. configured with the ``metrics`` resource.
  39. If your prometheus is older than 1.5.2, you will need to replace
  40. ``static_configs`` in the above with ``target_groups``.
  41. Restart Prometheus.
  42. Renaming of metrics & deprecation of old names in 1.2
  43. -----------------------------------------------------
  44. Synapse 1.2 updates the Prometheus metrics to match the naming convention of the
  45. upstream ``prometheus_client``. The old names are considered deprecated and will
  46. be removed in a future version of Synapse.
  47. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  48. | New Name | Old Name |
  49. +=============================================================================+=======================================================================+
  50. | python_gc_objects_collected_total | python_gc_objects_collected |
  51. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  52. | python_gc_objects_uncollectable_total | python_gc_objects_uncollectable |
  53. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  54. | python_gc_collections_total | python_gc_collections |
  55. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  56. | process_cpu_seconds_total | process_cpu_seconds |
  57. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  58. | synapse_federation_client_sent_transactions_total | synapse_federation_client_sent_transactions |
  59. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  60. | synapse_federation_client_events_processed_total | synapse_federation_client_events_processed |
  61. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  62. | synapse_event_processing_loop_count_total | synapse_event_processing_loop_count |
  63. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  64. | synapse_event_processing_loop_room_count_total | synapse_event_processing_loop_room_count |
  65. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  66. | synapse_util_metrics_block_count_total | synapse_util_metrics_block_count |
  67. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  68. | synapse_util_metrics_block_time_seconds_total | synapse_util_metrics_block_time_seconds |
  69. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  70. | synapse_util_metrics_block_ru_utime_seconds_total | synapse_util_metrics_block_ru_utime_seconds |
  71. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  72. | synapse_util_metrics_block_ru_stime_seconds_total | synapse_util_metrics_block_ru_stime_seconds |
  73. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  74. | synapse_util_metrics_block_db_txn_count_total | synapse_util_metrics_block_db_txn_count |
  75. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  76. | synapse_util_metrics_block_db_txn_duration_seconds_total | synapse_util_metrics_block_db_txn_duration_seconds |
  77. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  78. | synapse_util_metrics_block_db_sched_duration_seconds_total | synapse_util_metrics_block_db_sched_duration_seconds |
  79. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  80. | synapse_background_process_start_count_total | synapse_background_process_start_count |
  81. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  82. | synapse_background_process_ru_utime_seconds_total | synapse_background_process_ru_utime_seconds |
  83. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  84. | synapse_background_process_ru_stime_seconds_total | synapse_background_process_ru_stime_seconds |
  85. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  86. | synapse_background_process_db_txn_count_total | synapse_background_process_db_txn_count |
  87. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  88. | synapse_background_process_db_txn_duration_seconds_total | synapse_background_process_db_txn_duration_seconds |
  89. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  90. | synapse_background_process_db_sched_duration_seconds_total | synapse_background_process_db_sched_duration_seconds |
  91. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  92. | synapse_storage_events_persisted_events_total | synapse_storage_events_persisted_events |
  93. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  94. | synapse_storage_events_persisted_events_sep_total | synapse_storage_events_persisted_events_sep |
  95. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  96. | synapse_storage_events_state_delta_total | synapse_storage_events_state_delta |
  97. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  98. | synapse_storage_events_state_delta_single_event_total | synapse_storage_events_state_delta_single_event |
  99. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  100. | synapse_storage_events_state_delta_reuse_delta_total | synapse_storage_events_state_delta_reuse_delta |
  101. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  102. | synapse_federation_server_received_pdus_total | synapse_federation_server_received_pdus |
  103. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  104. | synapse_federation_server_received_edus_total | synapse_federation_server_received_edus |
  105. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  106. | synapse_handler_presence_notified_presence_total | synapse_handler_presence_notified_presence |
  107. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  108. | synapse_handler_presence_federation_presence_out_total | synapse_handler_presence_federation_presence_out |
  109. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  110. | synapse_handler_presence_presence_updates_total | synapse_handler_presence_presence_updates |
  111. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  112. | synapse_handler_presence_timers_fired_total | synapse_handler_presence_timers_fired |
  113. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  114. | synapse_handler_presence_federation_presence_total | synapse_handler_presence_federation_presence |
  115. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  116. | synapse_handler_presence_bump_active_time_total | synapse_handler_presence_bump_active_time |
  117. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  118. | synapse_federation_client_sent_edus_total | synapse_federation_client_sent_edus |
  119. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  120. | synapse_federation_client_sent_pdu_destinations_count_total | synapse_federation_client_sent_pdu_destinations:count |
  121. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  122. | synapse_federation_client_sent_pdu_destinations_total | synapse_federation_client_sent_pdu_destinations:total |
  123. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  124. | synapse_handlers_appservice_events_processed_total | synapse_handlers_appservice_events_processed |
  125. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  126. | synapse_notifier_notified_events_total | synapse_notifier_notified_events |
  127. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  128. | synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter_total | synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter |
  129. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  130. | synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter_total | synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter |
  131. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  132. | synapse_http_httppusher_http_pushes_processed_total | synapse_http_httppusher_http_pushes_processed |
  133. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  134. | synapse_http_httppusher_http_pushes_failed_total | synapse_http_httppusher_http_pushes_failed |
  135. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  136. | synapse_http_httppusher_badge_updates_processed_total | synapse_http_httppusher_badge_updates_processed |
  137. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  138. | synapse_http_httppusher_badge_updates_failed_total | synapse_http_httppusher_badge_updates_failed |
  139. +-----------------------------------------------------------------------------+-----------------------------------------------------------------------+
  140. Removal of deprecated metrics & time based counters becoming histograms in 0.31.0
  141. ---------------------------------------------------------------------------------
  142. The duplicated metrics deprecated in Synapse 0.27.0 have been removed.
  143. All time duration-based metrics have been changed to be seconds. This affects:
  144. +----------------------------------+
  145. | msec -> sec metrics |
  146. +==================================+
  147. | python_gc_time |
  148. +----------------------------------+
  149. | python_twisted_reactor_tick_time |
  150. +----------------------------------+
  151. | synapse_storage_query_time |
  152. +----------------------------------+
  153. | synapse_storage_schedule_time |
  154. +----------------------------------+
  155. | synapse_storage_transaction_time |
  156. +----------------------------------+
  157. Several metrics have been changed to be histograms, which sort entries into
  158. buckets and allow better analysis. The following metrics are now histograms:
  159. +-------------------------------------------+
  160. | Altered metrics |
  161. +===========================================+
  162. | python_gc_time |
  163. +-------------------------------------------+
  164. | python_twisted_reactor_pending_calls |
  165. +-------------------------------------------+
  166. | python_twisted_reactor_tick_time |
  167. +-------------------------------------------+
  168. | synapse_http_server_response_time_seconds |
  169. +-------------------------------------------+
  170. | synapse_storage_query_time |
  171. +-------------------------------------------+
  172. | synapse_storage_schedule_time |
  173. +-------------------------------------------+
  174. | synapse_storage_transaction_time |
  175. +-------------------------------------------+
  176. Block and response metrics renamed for 0.27.0
  177. ---------------------------------------------
  178. Synapse 0.27.0 begins the process of rationalising the duplicate ``*:count``
  179. metrics reported for the resource tracking for code blocks and HTTP requests.
  180. At the same time, the corresponding ``*:total`` metrics are being renamed, as
  181. the ``:total`` suffix no longer makes sense in the absence of a corresponding
  182. ``:count`` metric.
  183. To enable a graceful migration path, this release just adds new names for the
  184. metrics being renamed. A future release will remove the old ones.
  185. The following table shows the new metrics, and the old metrics which they are
  186. replacing.
  187. ==================================================== ===================================================
  188. New name Old name
  189. ==================================================== ===================================================
  190. synapse_util_metrics_block_count synapse_util_metrics_block_timer:count
  191. synapse_util_metrics_block_count synapse_util_metrics_block_ru_utime:count
  192. synapse_util_metrics_block_count synapse_util_metrics_block_ru_stime:count
  193. synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_count:count
  194. synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_duration:count
  195. synapse_util_metrics_block_time_seconds synapse_util_metrics_block_timer:total
  196. synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_utime:total
  197. synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_ru_stime:total
  198. synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_count:total
  199. synapse_util_metrics_block_db_txn_duration_seconds synapse_util_metrics_block_db_txn_duration:total
  200. synapse_http_server_response_count synapse_http_server_requests
  201. synapse_http_server_response_count synapse_http_server_response_time:count
  202. synapse_http_server_response_count synapse_http_server_response_ru_utime:count
  203. synapse_http_server_response_count synapse_http_server_response_ru_stime:count
  204. synapse_http_server_response_count synapse_http_server_response_db_txn_count:count
  205. synapse_http_server_response_count synapse_http_server_response_db_txn_duration:count
  206. synapse_http_server_response_time_seconds synapse_http_server_response_time:total
  207. synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_utime:total
  208. synapse_http_server_response_ru_stime_seconds synapse_http_server_response_ru_stime:total
  209. synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_count:total
  210. synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total
  211. ==================================================== ===================================================
  212. Standard Metric Names
  213. ---------------------
  214. As of synapse version 0.18.2, the format of the process-wide metrics has been
  215. changed to fit prometheus standard naming conventions. Additionally the units
  216. have been changed to seconds, from miliseconds.
  217. ================================== =============================
  218. New name Old name
  219. ================================== =============================
  220. process_cpu_user_seconds_total process_resource_utime / 1000
  221. process_cpu_system_seconds_total process_resource_stime / 1000
  222. process_open_fds (no 'type' label) process_fds
  223. ================================== =============================
  224. The python-specific counts of garbage collector performance have been renamed.
  225. =========================== ======================
  226. New name Old name
  227. =========================== ======================
  228. python_gc_time reactor_gc_time
  229. python_gc_unreachable_total reactor_gc_unreachable
  230. python_gc_counts reactor_gc_counts
  231. =========================== ======================
  232. The twisted-specific reactor metrics have been renamed.
  233. ==================================== =====================
  234. New name Old name
  235. ==================================== =====================
  236. python_twisted_reactor_pending_calls reactor_pending_calls
  237. python_twisted_reactor_tick_time reactor_tick_time
  238. ==================================== =====================