Metrics

Alfred Inflow leverages Micrometer, an application metrics facade that supports numerous monitoring systems, to expose application metrics for monitoring purposes.

Registration and exporting metrics is enabled by default, and it includes all endpoints and can be disabled globally with following property
management.endpoints.web.exposure.exclude=*

Prerequisites

Alfred Inflow consists of a standalone application (“the client”) and a module that should be installed in Alfresco (“the backend”). To expose metrics in the Alfresco module, Inflow depends on Alfred Telemetry. The metrics exposed by the backend module can be recognized by the inflow.backend prefix. If Alfred Telemetry is not installed in the target Alfresco, none of these metrics are available.

For a detailed overview on how to expose metrics with via Alfred Telemetry to various monitoring systems, have a look at the documentation of that project.

The supported monitoring systems and their configuration discussed in the next section are part of the client application. The supported metrics are exposed by the client, unless their name starts with inflow.backend.

A practical example: to monitor the full set of metrics with e.g. prometheus, we should configure prometheus to scrape both the {inflow-server}:{management.server.port}/actuator/prometheus and the {alfresco-server}/alfresco/s/alfred/telemetry/prometheus endpoint.

Supported monitoring systems

Although Micrometer supports a wide range of monitoring systems, Alfred Inflow only supports the ones mentioned in this section. For each monitoring system, the exporting of metrics can be enabled or disabled via the corresponding control property.

Prometheus

Control Property: management.endpoint.prometheus.enabled (default: enabled)

Prometheus expects to scrape or poll individual app instances for metrics. Alfred Inflow provides an endpoint available at /actuator/prometheus
to present a Prometheus scrape with the appropriate format.

If metrics are disabled globally or exporting metrics to prometheus is disabled, requests to the endpoint will result in a HTTP Status code 404

Alfred Inflow includes several timers that record the duration of specific parts of the ingestion process. The Micrometer documentations contains some specific information on graphing such timers when using Prometheus.

JMX

Control Property: management.health.jms.enabled (default: disabled)

Micrometer provides a hierarchical mapping to JMX, primarily as a cheap and portable way to view metrics locally. By default, metrics are exported to the metrics JMX domain.

Micrometer provides a default HierarchicalNameMapper that governs how a dimensional meter id is mapped to flat hierarchical names.

Supported metrics

Inflow domain - job specific

Following metrics relate to Inflow jobs and / or have a job-name tag that makes it possible to inspect them for each job individually.

  • inflow.client.jobs.count
    Tags: status(INACTIVE or ACTIVE).
    Control property: management.endpoint.job-service.enabled (default: enabled)
    The number of jobs registered in the monitored Inflow instance.
  • inflow.client.jobs.fs.input.count
    Tags: job-name
    Control property: management.endpoint.job-service.fs.enabled (default: disabled)
    The number of input files in the input folders for each job. Please note that this metric is disabled by default because this can be an expensive operation that may cause undesired side effects on the system.
  • inflow.client.processed.count
    Tags: job-name, status(INPUTQUEUE, OK or FAILED)
    The amount of files processed by a specific job.
  • inflow.client.available-reports
    Tags: job-name
    The amount of reports available in the Alfresco backend. In theory the client will always download ASAP and this number should approach zero. If the version of the installed backend is older than 3.5, it is possible that the job-name tag has value UNKNOWN.
  • inflow.client.remaining-capacity
    Tags: job-name
    The remaining capacity of the Alfresco backend. At the moment of this writing the maximum capacity is 5000, which means the value of this metric will be between 0 and 5000. If the version of the installed backend is older than 3.5, it is possible that the job-name tag has value UNKNOWN.

Inflow domain - ingestion process load

Next to job related metrics, Inflow also exposes metrics that can give insight in the general load off the ingestion process. These metrics can make it possible to discover certain performance bottlenecks.

Inflow client: storing reports

  • inflow.client.store-reports
    A timer that records the amount of time spent writing reports to the Inflow database.
  • inflow.client.store-reports.count
    The total number of reports written to the Inflow database.

Inflow client: connection to Alfresco

The client application communicates to the backend module via HTTP requests. Following metrics give insight in these requests and their throughput.

  • httpcomponents.httpclient.request.sent.bytes
    The total number of bytes sent to Alfresco.
  • httpcomponents.httpclient.request.received.bytes
    The total number of bytes received from Alfresco.
  • httpcomponents.httpclient.request
    A timer that records the duration of the requests to Alfresco.

All these metrics relate to a specific logical operation Inflow performs as part of the communication process between client and backend. The logical operation (e.g. CREATE_JOB, UPLOAD_DOCUMENTS, GET_REPORTS, …) is added as the uri tag to the metric.

Inflow backend: storing content

Related to a specific step of the ingestion process: storing the physical content to the Alfresco content store.

  • inflow.backend.store-content
    A timer that records the amount of time spent by the Inflow backend to store the content.
  • inflow.backend.store-content.size
    The total amount of physical content stored by Alfred Inflow, in bytes. This metric can also be used to graph the throughput of storing content in bytes/sec. 

Inflow backend: create, update or delete nodes

Related to a specific step of the ingestion process: creating, updating or deleting nodes in the Alfresco database.

Most metrics contain two units: documents or packages. A package is a number of documents that should always be uploaded or modified together in Alfresco. For more information, have a look at the enable packaging configuration option in the user guide.

As a performance optimization, Inflow always tries to upload or modify a batch of packages in a single Alfresco transaction. This decreases the total overhead caused by the transaction itself. If this batch uploads fails, each package is retried in an individual transaction. For both operations (“batched” vs “retry”) metrics are available. If too much retries are necessary, this is detrimental for the performance and it is worth investigating what causes the original operation to fail.

  • inflow.backend.create-packages
    A timer that records the amount of time spent by the Inflow backend to create or modify nodes, as a batch of packages in a single transaction.
  • inflow.backend.create-packages.packages.counter
    The total amount of packages that has been created in the initial batch operation.
  • inflow.backend.create-packages.documents.counter
    The total amount of documents that has been created in the initial batch operation.
  • inflow.backend.retry-create-package
    A timer that records the amount of time spent by the Inflow backend to create or modify nodes, retried as a single package after the initial batch operation failed. In an ideal scenario all nodes are created in the initial batch operation and Inflow shouldn’t spent time in this retry operation.
  • inflow.backend.retry-create-package.packages.counter
    The total amount of packages that has been created in the retry operation. In an ideal scenario this is zero. time in this retry operation.
  • inflow.backend.retry-create-package.documents.counter
    The total amount of documents that has been created in the retry operation. In an ideal scenario this is zero.

Inflow backend: find existing nodes

Related to a specific step of the ingestion process: matching existing Alfresco nodes to the new upload. The amount of time spent in this operation should be as close as possible to zero. However it is possible to extend this logic by providing custom NodeLocatorService implementations and therefore it is good to have insight on the impact of these implementations. Please consult the developer documentation for more information.

  • inflow.backend.find-existing-node
    A timer that records the amount of time spent by the Inflow backend to match existing nodes to the new upload.

Inflow backend: buffer monitoring

Certain operations of the Inflow backend can be seen as processing buffers. Inflow provides metrics to get an insight in these buffers through following counters:

  • inflow.backend.documents.in.counter
  • inflow.backend.documents.out.counter
  • inflow.backend.packages.in.counter
  • inflow.backend.packages.out.counter

All these counters have an operation tag to with one of these values:

  • all
    The amount of documents / packages that has been accepted (in) or processed (out) by the backend. The difference between those the out and in counter gives an idea of the current amount of documents / packages the backend is processing.
  • create-ancestors
    The amount of documents / packages for which Inflow is creating the ancestor nodes (= folders).
  • create-node
    The amount of documents / packages for which Inflow is creating or modifying the actual document or package.
  • validate-node
    After nodes have been created, Inflow performs a quick sanity check to make sure the document actually exists in Alfresco. This metric indicates the amount of documents for which Inflow is performing this check.

There are multiple metrics related to the Java Virtual Machine and the rest of the system, which can be disabled or enabled individually.

Uptime metrics

The uptime metrics bindings will provide system uptime metrics.

Metrics provided

  • process.uptime
    The uptime of the Java virtual machine.
  • process.start.time
    Start time of the process since unix epoch.

Jvm Memory metrics

Metrics provided

  • jvm.buffer.count
    An estimate of the number of buffers in the pool.
  • jvm.buffer.memory.used
    An estimate of the memory that the Java virtual machine is using for this buffer pool.
  • jvm.buffer.total.capacity
    An estimate of the total capacity of the buffers in this pool.
  • jvm.memory.committed
    The amount of memory in bytes that is committed for the Java virtual machine to use.
  • jvm.memory.max
    The maximum amount of memory in bytes that can be used for memory management.
  • jvm.memory.used
    The amount of used memory.

Jvm heap pressure metrics

Metrics provided

  • jvm.memory.usage.after.gc
    The percentage of old gen heap used after the last GC event, in the range [0..1].
  • jvm.gc.overhead
    An approximation of the percent of CPU time used by GC activities over the last lookback period or since monitoring began, whichever is shorter, in the range [0..1].

Jvm GC metrics

Metrics provided

  • jvm.gc.max.data.size
    Max size of old generation memory pool.
  • jvm.gc.live.data.size
    Size of old generation memory pool after a full GC.
  • jvm.gc.memory.promoted
    Count of positive increases in the size of the old generation memory pool before GC to after GC.
  • jvm.gc.memory.allocated
    Incremented for an increase in the size of the young generation memory pool after one GC to before the next.
  • jvm.gc.concurrent.phase.time
    Time spent in concurrent phase.
  • jvm.gc.pause
    Time spent in GC pause.

Jvm thread metrics

Metrics provided

  • jvm.threads.peak
    The peak live thread count since the Java virtual machine started or peak was reset.
  • jvm.threads.daemon
    The current number of live daemon threads.
  • jvm.threads.live
    The current number of live threads including both daemon and non-daemon threads.
  • jvm.threads.states
    The current number of threads having a current state. The state is added as a lable to the metric.

Process threads

Metrics provided

  • process.threads
    The number of process threads.

Class loader metrics

Metrics provided

  • jvm.classes.loaded
    The number of classes that are currently loaded in the Java virtual machine.
  • jvm.classes.unloaded
    The total number of classes unloaded since the Java virtual machine has started execution.

Processor metrics

The processor metrics bindings will provide system processor metrics.

Metrics provided

  • system.cpu.count
    The number of processors available to the Java virtual machine.
  • system.load.average.1m
    The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time.
  • system.cpu.usage
    The “recent cpu usage” for the whole system.
  • process.cpu.usage
    The “recent cpu usage” for the Java Virtual Machine process

File Descriptor metrics

The file descriptor metrics bindings will provide system file descriptor metrics.

Metrics provided

  • process.files.open
    The open file descriptor count
  • process.files.max
    The maximum file descriptor count