Deepcrawl is now Lumar. Read more.
DeepcrawlはLumarになりました。 詳細はこちら

Lumar Log File Integration: Logz.io

Lumar Product Guide header image showing the Lumar Logz.io query details options

At Lumar, we understand that log file analysis is one of the most critical data sources in search engine optimization. Log file data allows SEO teams to identify how search engine crawlers are accessing the website and can help troubleshoot crawling and indexing issues on a website.

To help our customers easily pull log file data into Lumar, we have partnered with Logz.io.

Our team has written this guide to help customers answer the following questions:

 

What is Logz.io?

Loz.io is a company that provides log management and log analysis services. Their platform combines ELK as a cloud service and machine learning to derive new insights from machine data.

What is ELK?

ELK stack is an acronym used to describe an AWS log management platform consisting of three popular open-source projects: Elasticsearch, Logstash, and Kibana.

  • Elasticsearch is a deep search and data analytics project.
  • Logstash is a centralized logging and parsing project.
  • Kibana is a data visualization project.

ELK gives customers the ability to:

  1. Aggregate logs from all systems and applications
  2. Analyze these logs, and
  3. Create visualizations for application and infrastructure monitoring.
 

How does Logz.io work with Lumar?

Logz.io and Lumar work together as follows:

  1. An account is created in Logz.io.
  2. Log files are shipped from the webserver to Logz.io.
  3. Log files are aggregated and stored in Logz.io.
  4. An API token is generated in the Logz.io account.
  5. This API token is then saved in Lumar.
  6. The API token is then used in the set up of the Logz.io connection in Lumar.
  7. A query is then created in Lumar to fetch log file data through the API.
  8. Lumar sends an authentication request to the API using the API token.
  9. Logz.io API accepts the token and allows Lumar to start requesting log file data based on the query.
  10. The log file data is crawled, processed, and visualized in Lumar, along with other data sources.
 

How to set up and configure Logz.io and Lumar

When integrating with Logz.io, it is crucial to make sure:

  1. Log files are shipping to the Logz.io ELK stack, and
  2. Lumar is set up to access log files from Logz.io.

Let’s go through these two steps in detail:

1. Shipping log files to Logz.io

Before Lumar can crawl log files, they need to be shipped to Logz.io ELK stack for storage and processing.

Logz.io provides a list of all the shipping solutions in their documentation and the dashboard.

Hover over “Logs” in the left side navigation bar:
how to integrate Lumar and Logz.io - go to Logz.io navigation bar

Mouse down to “Send your logs” and click on it in the popup menu:

Logz.io Nav Menu - Manage Data - Send your Logs
Our team strongly recommends going through the shipping options with your engineering or IT teams to better understand how to ship logs into Logz.io for your web server.

The most common log shipping methods for Logz.io are Filebeat and Amazon S3 fetcher.

Filebeat

Filebeat is a lightweight open-source log shipping agent installed on a customer’s HTTP server. Logz.io recommends it because it is the easiest way to get logs into their system.

Logz.io also provide documentation on how to set up Filebeat on the most common web servers:

Amazon S3 fetcher

Amazon Web Services can be used to ship certain services (for example, Cloudfront log files) to an S3 bucket, where the Logz.io Amazon S3 fetcher can be used to fetch the logs.

Our team recommends going through the documentation and technical setup around this option with your engineering and/or IT teams.

2. Allowing Lumar access to Logz.io

Once the log files have been shipped to the ELK stack in Logz.io, the Lumar connection needs to be set up.

1. Go to the Connected Apps page in Lumar. Click on the use user settings icon in the top right of the screen once you’ve logged in, and choose ‘Connected Apps’

Screenshot of the Lumar Analyze all projects screen with the user settings icon and Connected Apps option highlighted.

2. If you don’t have the Logz.io as part of your subscription, you’ll see the option to request the add on.

Screenshot of the Lumar connected apps page with the Logz.io option highlighted. The text reads: "We offer a full integration with Logz.io and can help you setup your account. Once setup we will be able to automatically import up-to-date bot crawling information every time we crawl your website. Contact your account manager to get started". There is also a button marked 'Request add-on'.

Once the Logz.io integration is added to your subscription, you’ll the option to add your Logz.io account.

Screenshot of the Lumar connected apps page once the Logz.io has been added to the subscription. The screen shows a message saying "You haven't added any Logz.io accounts to Lumar yet" and has a button to add a Logz.io account.

3. Click “Add Logz.io account” and navigate to Logz.io.

Screenshot of Lumar, showing the option to add a Logz.io account. The screen shows fields for a Label and Token from the Logz.io platform, with buttons for cancel and add.

4. Go to the Logz.io dashboard and click on the cog icon in the top right. Go to Tools > API tokens.

Select the Cogwheel “Settings” in the lower-left Logz.io navigation pane.

Logz.io Nav Menu - Settings

Mouse over and click on “Manage Tokens” in the pop-up menu:
Logz.io Navigation Menu for Lumar integration- Manage Tokens

5. Click “API Tokens” On the “Manage Tokens” screen, then + New API Token
Logz.io menu for integration with Lumar - Manage API tokens

Choose a name for the new token and click “Add.”
Logz.io navigation menu - Add new API token for integration with Lumar

6. Copy the API token, return to Lumar and paste into the Logz.io account info on the Connected Apps. Click the Add button to save the details.

Screenshot of Lumar, showing Logz.io account information, with a Label and Token from the Logz.io platform added, and buttons for cancel and add.

7. The API token will then be saved in the Connected Apps page in your account.

Screenshot showing the Logz.io section of the Lumar connected apps page, showing a Logz.io account connected. The screen shows the label, the end of the API token, and that the account is valid. There are also options to edit the details, delete the account or add another Logz.io account.
 

Adding log file data to a crawl

1. To select log files as a data source from Logz.io navigate to Step 2 in the crawl setup.

Screenshot of step 2 of the Lumar crawl setup process, showing the sources that will be crawled. In this screen Log Summary is checked and highlighted. There is also an option to add a Logz.io query.

2. Scroll down to the Log Summary source, select Logz.io Log Manager and click on the “Add Logz.io Query” button.

Screenshot of step 2 of the Lumar crawl setup process, focused in on the Log Summary options, with Logz.io and the 'Add Logz.io query' button highlighted.

3. The “Add Logz.io Query” button will open a query builder which, by default, contains pre-filled values for the most common server log file setup (more information about values below).

Screenshot showing the Logz.io query details, showing the token, URL field name, base URL, maximum number of URLs to fetch, date range, plus user agent field name, desktop  and mobile user agent match and does not match options, and a JSON query filter.

4. Once the query builder is configured, hit the save button to allow Lumar to crawl URLs from Logz.io API.

 

How to configure the query builder

The query builder can be used to customize the default values in Lumar.

Screenshot of the default values in a Logz.io query.

The query builder requires analysis and editing to make sure that Lumar can pull in log file data from Logz.io.

Our team has described each field below and how to make sure each can be checked to make sure the query builder is correctly set up.

Base URL

The base URL value is the HTTP scheme (ie. https://) and domain (www.lumar.com), which will be used to append to the beginning of relative URLs (.i.e. /example.html) in log file data.

If it is left blank, the primary domain in the project settings will be used by default.

Please make sure that the URLs in the log file data are from the primary domain you want to crawl, otherwise, Lumar will flag a large number of crawl errors.

Token

The Logz.io token value is the API token used to connect Lumar to your Logz.io account.

Please make sure that the API token used is still active, and is created in the correct account.

Date Range

The date range value uses days as a metric, and Lumar will collect logs in the timeframe of the days inputted into this field.

By default, the date range is set to 30 days. Check if the date range used in your Logz.io account contains log file data.

Desktop User Agent Match and Does Not Match

These fields tells Lumar only to fetch data from Logz.io which matches a specific desktop user-agent string, and does not match another. This ensures the highest levels of accuracy in the data.

The default options for desktop user agent are:

  • Match: (.*Googlebot.*http://www.google.com/bot.html.*)
  • Does Not Match: (.*Mobile.*)

If you require customized user-agent strings, please get in contact with your customer success manager.

Mobile user-agent regex

As with the desktop options above, these fields tells Lumar only to fetch data from Logz.io which matches a specific mobile user-agent string, and does not match another.

The default options for desktop user agent are:

  • Match: (.*Mobile.*Googlebot.*http://www.google.com/bot.html.*)
  • Does Not Match: This field is empty by default and can be completed to meet your needs

If you require customized user-agent strings, please get in contact with your customer success manager.

Max number of URLs

This is the maximum number of URLs you want to fetch from the Logz.io API.

Please be aware that this field will not override the total URL limit in the project settings.

URL field name

The URL field name is the name of the URL column in the Logz.io database. This field helps Lumar look up the column that lists all relative URL rows in Logz.io, and fetch the pages which are in the log files.

By default, the query builder will look for a column called “request”. For most websites, this will allow Lumar to look up the right column and fetch the relevant URL rows.

Screenshot of the Logz.io query details with the URL field name highlighted.

However, each website is unique, and different tech stacks can cause these columns to be named differently. This means that sometimes, the URL field name will need to be updated.

To do this, navigate to the Logz.io dashboard > Logs > Kibana:

Logz.io nav - see Kibana drop-down menu - to update URL names

Click on the arrow icon next to the top row of the log file data.

Logz.io nav menu - click on small arrow icon next to log file item

The arrow icon will open up all the columns and data for that specific hit:

Logz.io log file detailed data page - opened from larger list

In this drop-down look for the column with the URL, which was requested in the log file data, be careful not to mix up the source URL column.

Logz.io log file detailed data page - identify the URL
Once you have identified the URL, make a note of the name of the column. In the example screenshot above, this is “request”.

Go back to the query builder in Lumar and make sure the URL field name matches the name of the column.

User agent field name

The user agent field name is the name of the user agent column in the Logz.io database. This field helps Lumar look up the column, which lists all the user agent strings in Logz.io and applies the user agent regex to filter out particular bot hits.

By default, the query builder will look for a column called “agent”. For most websites, this will allow Lumar to look up the right column and fetch the relevant URLs with particular user agents.

Screenshot of the Logz.io query details with the User Agent field name highlighted.

However, each website is unique, and the different tech stacks can cause these columns to be named differently. This means that sometimes the user agent field name will need to be updated.

To do this, navigate to the Logz.io dashboard > Kibana > Discover.
Logz.io nav - see Kibana drop-down menu - to update URL names

Click on the arrow icon next to the top row of log file data.

Logz.io Kibana Discover Dropdown

The arrow will open up all the columns and data for that specific hit.

Logz.io Fields

In this drop-down look for the column with the user agent string, which requested the log file data.

Logz.io log file - find the user agent

Once you have identified the user agent column name, please make a note of the name it is using. Go back to the query builder in Lumar and make sure the URL field name matches the name of the column.

 

Filtering log file data in the query builder

The Lumar query builder can also filter data that is fetched from Logz.io using JSON.

Screenshot of the Logz.io query filter (JSON) option.

For example, if you wanted to filter on a specific domain or subdomain.

Our team recommends getting in touch with the customer success team if you want to filter using JSON.

 

Frequently Asked Questions

Should I run a sample crawl?

Yes, our team always recommends running a sample crawl when a new Logz.io query has been set up as a crawl source.

Running a sample crawl will prevent you from having to wait for a massive crawl to finish only to discover there was an issue with the settings. It also helps to reduce the number of wasted credits.

Why is Lumar not pulling in log file data?

These are the most common reasons why Logz.io may not be pulling in data:

  • The API token is not from the correct account.
  • The user agent regex is not correct.
  • The URL fields are not correct.

If log files are still not being pulled in after these issues have been resolved, then we recommend getting in touch with your customer success manager.

Why is Lumar not able to crawl log file data?

Sometimes log file data is being pulled in correctly, but due to other issues, the crawl still fails.

Our team also recommends reading the “how to debug blocked crawls” and “how to fix failed website crawls” documentation.

 

Further questions on Logz.io?

If you’re still having trouble setting up Logz.io, then please don’t hesitate to get in touch with our support team.

Avatar image for Adam Gent
Adam Gent

Product Manager & SEO Professional

Search Engine Optimization (SEO) professional with over 8 years’ experience in the search marketing industry. I have worked with a range of client campaigns over the years, from small and medium-sized enterprises to FTSE 100 global high-street brands.

Newsletter

Get the best digital marketing & SEO insights, straight to your inbox