The Cloud Logging and Cloud Monitoring Agents (previously known as Stackdriver) are lifesavers when it comes to supporting and operating servers in Google Cloud Platform (GCP). However, there are times when deploying the Cloud Logging agent can be tricky, this article describes how you can quickly fix the 2 most common issues (in our experience) with them.
Quick note, if you haven’t been using these agents in GCP, you really should! Check out these handy links for deploying them into your VMs:
- Cloud Logging Agent Installation: https://cloud.google.com/logging/docs/agent/installation
- Cloud Monitoring Agent Installation: https://cloud.google.com/monitoring/agent/installation
Once installed, they will send near-realtime data to your Cloud Monitoring and Cloud Logging consoles for you to get a live-feed on your instances running in GCP.
Cloud Monitoring Agent swap file errors
When deploying Cloud Monitoring Agents in Linux VM instances, you’ll sometimes see this error occur: write_gcm: can not take infinite value
. This error occurs because you may not have configure a swap file for memory, however, the Cloud Monitoring Agent is programmed to look for it by default. What occurs then is that the value cannot be found, hence the repeated errors.
To resolve this issue with the Cloud Monitoring Agent, locate the following lines in the configuration file /etc/stackdriver/collectd.conf
and remove them (we commonly use vim for this, but any text editor will work):
LoadPlugin swap
<Plugin "swap">
ValuesPercentage true
</Plugin>
After removing the lines above, restart the Cloud Monitoring Agent with the following command: sudo service stackdriver-agent start
Credits to https://myshittycode.com/2020/06/13/gcp-stackdriver-agent-write_gcm-can-not-take-infinite-value-error/ for the solution which has no doubt helped many people (us included!).
Cloud Logging Agent consumes large amounts of CPU and does not log data to Cloud Logging
When deploying the Cloud Logging Agent in Windows Server instances in GCP, you’ll sometimes see that the Ruby Interpreter consistently uses a large amount of CPU and does not produce any successful logging into Cloud Logging’s winevt.raw category. This issue normally occurs when you use older versions of Windows Server images.
To diagnose this issue, inspect the Cloud Logging Agent’s logs located at: C:\Program Files (x86)\Stackdriver\LoggingAgent\fluentd.log
. If you spot that the Cloud Logging Agent repeatedly restarts itself after being terminated with the following lines:
[info]: #0 Initialized the insert ID key to xxxxxxxxxxxxx
[info]: #0 fluentd worker is now running worker=0
[info]: Worker 0 finished unexpectedly with signal SIGSEGV
[info]: #0 Initialized the insert ID key to xxxxxxxxxxxxx
[info]: #0 fluentd worker is now running worker=0
[info]: Worker 0 finished unexpectedly with signal SIGSEGV
This indicates that there is an issue with the gRPC plugin that is being used by the Ruby Interpreter and is described in further detail here: https://github.com/grpc/grpc/issues/7876. To resolve this issue, edit the C:\Program Files (x86)\Stackdriver\LoggingAgent\fluent.conf
file with a text editor and locate the following line:
use_grpc true
And change that to:
use_grpc false
After that, restart the StackdriverLogging Service and you should be able to see the Ruby interpreter start up and gradually settle down into low CPU consumption. Remember to check your Cloud Logging logs to see if the winevt.raw category was automatically created by the Cloud Logging Agent (this may take several minutes after startup as it polls all the existing logs).
Note: According to a ticket raised by us with Google Support, this will be fixed in a future release of the agent. So stay tuned!
Further troubleshooting for the Cloud Logging and Cloud Monitoring Agents
If the steps above did not help you solve your issues, the first stop to fixing any issues is to attempt troubleshooting via these links:
- Troubleshooting the Cloud Logging Agent: https://cloud.google.com/logging/docs/agent/troubleshooting
- Troubleshooting the Cloud Monitoring Agent: https://cloud.google.com/monitoring/agent/troubleshooting
If the Google Cloud articles above do not resolve your issues, reach out to us at https://www.matrixc.com/contact-us/ for assistance, we would be happy to help!