General

Why should I use Arkime?

If you're in search of a comprehensive, standalone open-source solution for full packet capture (FPC) and network analysis that includes both metadata parsing and searching capabilities, Arkime stands out as the premier choice. Full packet capture systems are indispensable tools for network and security analysts, offering an unfiltered view of network activities and enabling a detailed analysis of events from a network perspective. As an open-source platform, Arkime affords its users total control over both deployment and architecture, ensuring that you can tailor the system to meet your specific requirements. While there are other FPC systems available, Arkime's unique blend of features and open-source accessibility makes it a standout solution for those needing detailed and actionable network insights.

How do you pronounce our name?

(/ɑːrkɪˈmi/) Read more about why we changed our name here.

Upgrading Arkime

Upgrading Arkime is a sequential process that requires installing each major version in the order outlined in the table below. This step-by-step approach ensures that your system transitions smoothly between versions without missing critical updates or features. If you find that your current version of Arkime is not explicitly mentioned in the chart, it is recommended to upgrade to the immediately higher version listed. By following this method, you can sequentially upgrade through the major releases until your system is up-to-date. Typically, transitioning between versions requires the execution of the db.pl upgrade script. This script is designed to update your database schemas and configurations to be compatible with the newer version of Arkime. Unless specified otherwise in the upgrade documentation, running this script should be the only additional step needed to complete the upgrade process.

Name Version Min Version to
Upgrade From
OpenSearch
Versions
Elasticsearch
Versions
Special Instructions Notes
Arkime 5.0+ 4.3.2+ (4.6.0 recommended) 1.0.0+ (2.10+ recommended) 7.10+, 8+ (7.17+, 8.10+ recommended) Arkime 5.x instructions
Arkime 4.0+ 3.3.0+ (3.4.0 recommended) 1.0.0+ (2.3+ recommended) 7.10+, 8+ Arkime 4.x instructions
Arkime 3.0+ 2.4.0 1.0.0+ (2.3 recommended) 7.10+, not 8.x Arkime 3.x instructions
Arkime 2.7+ 2.0.0 N/A 7.4+ (7.9.0+ recommended, 7.7.0 broken) Elasticsearch 7 instructions
Moloch 2.2+ 1.7.0 (1.8.0 recommended) N/A 6.8.2+ (6.8.6+ recommended), 7.1+ (7.8.0+ recommended, 7.7.0 broken) Moloch 2.x instructions Must already be on 6.8.x or 7.1+ before upgrading to 2.2
Moloch 2.0, 2.1 1.7.0 (1.8.0 recommended) N/A 6.7, 6.8, 7.1+ Moloch 2.x instructions Must already be on Elasticsearch 6.7 or 6.8 (Elasticsearch 6.8.6 recommended) before upgrading to 2.0
Moloch 1.8 1.0.0 (1.1.x recommended) N/A 5.x or 6.x Elasticsearch 6 instructions Must have finished the 1.x reindexing; stop captures for best results
Moloch 1.1.1 0.20.2 (0.50.1 recommended) N/A 5.x or 6.x (new only) Instructions Must be on Elasticsearch 5 already
Moloch 0.20.2 0.18.1 (0.20.2 recommended) N/A 2.4, 5.x Elasticsearch 5 instructions

What operating systems are supported?

Arkime is pre packaged to support a wide range of operating systems, available on the downloads page. The Arkime development team predominantly works with the EL 8 build, opting between the pcap and afpacket readers based on the specific needs of each deployment. You are encouraged to use the afpacket reader whenever possible to achieve the best capture performance. While a substantial portion of our development efforts takes place on macOS, leveraging the Homebrew package manager, it's important to note that this environment has not been vetted in a production setting. :) It's also worth mentioning that Arkime has phased out support for 32-bit machines, consequently, the software is incompatible with many lower-powered devices. Furthermore, our support is currently limited to LTS versions of Ubuntu, due to potential library compatibility issues with non-LTS releases.

The following operating systems distributions and versions are supported directly:

Arkime is not working

Here is the common checklist to perform when diagnosing a problem with Arkime (replace /opt/arkime with /data/moloch for Moloch builds):

  1. Check that OpenSearch/Elasticsearch is running and GREEN by using the curl command curl http://localhost:9200/_cat/health on the machine running OpenSearch/Elasticsearch. An Unauthorized response probably means that you need user:pass in all OpenSearch/Elasticsearch URLs or that you are using the wrong URL.
  2. Check that the db has been initialized with the /opt/arkime/db/db.pl http://elasticsearch.hostname:9200 info command. You should see information about the database version and number of sessions.
  3. Check that viewer is reachable by visiting http://arkime-viewer.hostname:8005 from your browser.
    1. If it doesn't render, looks strange, or warns of an old browser, use a newer supported browser.
    2. If the browser can't connect and you are sure viewer.js is running, verify there are no firewalls blocking access between your browser and the viewer host.
    3. Make sure viewHost=localhost is NOT set in the config.ini file. Test that curl http://IP:8005 works from the host viewer is running on.
  4. Check for errors in /opt/arkime/logs/viewer.log and that viewer is running with the pgrep -lf viewer command. If the UI looks strange or isn't working, viewer.log will usually have information about what is wrong.
  5. Check for errors in /opt/arkime/logs/capture.log and that capture is running with the pgrep -lf capture command. If packets aren't being processed or other metadata generation issues, capture.log will usually have information about what is wrong and links to the FAQ on how to fix.
  6. To check that the stats page shows the capture nodes you are expecting, visit http://arkime-viewer.hostname:8005/stats?statsTab=1 in your browser.
    1. If the packets being received for any node is low, that node is having issues, please check its capture.log
    2. If the timestamp for any node is over 5 seconds old, that node is having issues, please check its capture.log
    3. If the Disk Q or ES Q for any node is above 50, that node is having issues, please check its capture.log
  7. Disable any bpf= in /opt/arkime/etc/config.ini. If that fixes the issue, read the BPF FAQ answer.
  8. If the browser has "Oh no, Arkime is empty! There is no data to search." but the stats tab shows packets are being captured:
    1. Arkime in live capture mode only writes records when a session has ended. It may take several minutes for a session to show up after a fresh start. See /opt/arkime/etc/config.ini to shorten the timeouts.
    2. OpenSearch/Elasticsearch will only refresh the indices once a minute with the default Arkime configuration. Force a refresh with the curl http://elasticsearch.hostname:9200/_refresh command.
    3. Verify that your time frame for search covers the data (try switching to ALL).
    4. Check that you don't have a view set.
    5. Check that your user doesn't have a forced expression set. You might need to ask your Arkime admin.
  9. If you are having packet capture issues, restart capture after turning on debugging, either add --debug to the start line or add debug=1 in the [default] section of your config.ini file. You can add multiple --debug options or set debug= to a larger number to get even more information. Capture will print out the configuration settings it is using; verify that they are what you expect.
  10. If you are having issues viewing packets that were captured, restart viewer after turning on debugging, either add --debug to the start line or add debug=1 in the [default] section of your config.ini file. You can add multiple --debug options or set debug= to a larger number to get even more information. Viewer will print out the configuration settings it is using; verify that they are what you expect.
    1. Make sure the plugins and parsers directories are correctly set in /opt/arkime/etc/config.ini and readable by the viewer process.
  11. Check the output of the following:
    grep moloch_packet_log /opt/arkime/logs/capture.log | tail
    Verify that the packets number is greater than 0. If not, then no packets were processed. Verify that the first pstats number is greater than 0. If not, Arkime didn't know how to decode any packets.

How do I reset Arkime?

  1. Leave OpenSearch/Elasticsearch running.
  2. Shut down all running viewer or capture processes so that no new data is recorded.
  3. To delete all the SPI data stored in OpenSearch/Elasticsearch, use the db.pl script with either the init or wipe commands. The only difference between the two commands is that wipe leaves the added users so that they don't need to be re-added.
    /opt/arkime/db/db.pl http://ESHOST:9200 wipe
  4. Delete the PCAP files. The PCAP files are stored on the file system in raw format. You need to do this on all of the capture machines.
    /bin/rm -f /opt/arkime/raw/*

Self-Signed or Private CA TLS Certificates

The core Arkime team advises against the use of self-signed certificates, despite their technical feasibility. Instead, we encourage leveraging the financial benefits derived from using Arkime over commercial full packet capture solutions to invest in legitimate certificates. The cost of wildcard certificates has decreased significantly, making them an affordable option. Alternatively, free certificates from Let's Encrypt represent a viable option. While members of the Arkime Slack workspace may offer assistance, the core development team typically refers queries back to this guidance. It's important to note that private CA certificates face similar challenges and require analogous solutions as those encountered with self-signed certificates.

One of the simplest methods to bypass the hurdles associated with self-signed certificates involves adding them to your operating system's list of recognized certificates or chains. The process for doing so varies widely across different OS distributions and versions, making a quick internet search the most efficient strategy to find specific instructions. It may be necessary to register the certificate across multiple trust stores due to the varied certificate validation locations utilized by node (for the viewer), curl (for capture), and perl (for db.pl). The Viewer component of Arkime includes a caTrustFile option, a feature introduced by contributors to the project. Since version 4.2.0, all components of Arkime are designed to support the caTrustFile configuration.

An alternative, though less secure option, involves disabling certificate verification entirely. Key components such as Capture, Viewer, arkime_add_user.sh, and db.pl accept the --insecure flag to bypass certificate checks. This flag must be appended to the startup commands for both capture and viewer services. In newer versions of Arkime this can be done by


          cp /opt/arkime/etc/env.sample /opt/arkime/etc/capture.env
          echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/capture.env
          cp /opt/arkime/etc/env.sample /opt/arkime/etc/viewer.env
          echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/viewer.env
        
You can edit /opt/arkime/etc/capture.env and /opt/arkime/etc/viewer.env to add other options. We recommend using env files instead of editing the systemd files since they may get overwritten by new installs.

How do I upgrade to Moloch 1.x?

Moloch 1.x has some large changes and updates that will require all session data to be reindexed. The reindexing is done in the background AFTER upgrading, so there is little downtime. Large changes in 1.0 include the following:

If you have any special parsers, taggers, plugins, or WISE sources, you may need to change configurations.

To upgrade:

Once 1.1.1 is working, you need to reindex the old session data:

How do I upgrade to Moloch 2.x?

Upgrading to Moloch 2.x is a multistep process that requires an outage. An outage is required because all the captures must be stopped before upgrading the database so that there are no schema issues or corruption. Most of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with 6.7 or 6.8. This is very important when upgrading to Elasticsearch 7.x or later.

How do I upgrade to Arkime 3.x?

Upgrading to Arkime 3.x is a multistep process that requires an outage. An outage is required because all the captures MUST be stopped before upgrading the database so that there are no schema issues or corruption. Do not restart the capture processes until the db.pl upgrade has finished! All of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with version 7. This is very important when upgrading to Elasticsearch 8.x or later.

Breaking Changes

Instructions

How do I upgrade to Arkime 4.x?

Upgrading to Arkime 4.x requires that you are already using Arkime 3.3.0 or later. Arkime 4.x uses a new permissions model with roles.

Breaking Changes

Instructions

How do I upgrade to Arkime 5.x?

Upgrading to Arkime 5 requires that you are already using Arkime 4.3.2 or later.

Breaking Changes

Instructions


OpenSearch/Elasticsearch

Arkime is designed to work seamlessly with both OpenSearch and Elasticsearch, underscoring our commitment to support both platforms moving forward. While some of the documentation and configurations might exclusively mention Elasticsearch, it's important to note that OpenSearch is compatible with Arkime versions that support Elasticsearch 7 and above. As the two systems continue to evolve separately, there might be new features introduced that are specific to either OpenSearch or Elasticsearch. Rest assured, Arkime is committed to remaining accessible without the necessity for any paid features of Elasticsearch, though we may choose to provide optional support for such features.

How many OpenSearch/Elasticsearch nodes or machines do I need?

The answer, of course, is "it depends." Factors include:

The following are some important things to remember when designing your cluster:

We have some estimators that may help.

The good news is that it is easy to add new nodes in the future, so feel free to start with fewer nodes. As a temporary fix for capacity problems, you can reduce the number of days of metadata that are stored. You can use the Arkime ES Indices tab to delete the oldest sessions2 or sessions3 index.

Data never gets deleted

The SPI data in OpenSearch/Elasticsearch and the PCAP data are not deleted at the same time. The PCAP data is deleted as the disk fills up on the capture machines. See here for more information. PCAP deletion happens automatically, and nothing needs to be done. The SPI data is either deleted by using ILM or when the ./db.pl expire command is run, usually from cron during off peak. Unless you use ILM, the SPI data deletion does NOT happen automatically, and a cron job MUST be set up. A cron setup that only keeps 90 days of data and expires at midnight might look like this:

 0 0 * * * /opt/arkime/db/db.pl http://localhost:9200 expire daily 90

So deleting a PCAP file will NOT delete the SPI data, and deleting the SPI data will not delete the PCAP data from disk.

The UI does have commands to delete and scrub individual sessions, but the user must have the Remove Data ability on the users tab. This feature is used for things you don't want operators to see, such as bad images, and not as a general solution for freeing disk space.

ERROR - Dropping request

This error means that your OpenSearch/Elasticsearch cluster can not keep up with the number of sessions that the capture nodes are trying to send it or there are too many messages being sent. You may only see the error message on your busiest capture nodes because capture tries to buffer the requests.

Check the following:

If these don't help, you need to add more nodes or reduce the number of sessions being monitored. You can reduce the number of sessions with packet-drop-ips, bpf filters, or rules files, for example.

When do I add additional nodes? Why are queries slow?

If queries are too slow, the easiest fix is to add additional OpenSearch/Elasticsearch nodes. OpenSearch/Elasticsearch doesn't perform well if Java hits an OutOfMemory condition. If you ever have one, you should immediately delete the oldest *sessions* index, update the daily expire cron to delete more often, and restart the OpenSearch/Elasticsearch cluster. Then you should order more machines. :)

Removing Nodes

If you need to remove nodes from your OpenSearch/Elasticsearch cluster, follow these steps as an Admin:

  1. Access the Arkime stats page and navigate to the ES Shards tab.
  2. Identify and select the nodes you wish to remove, then choose to exclude them.
  3. Allow some time for the shards to be relocated.
  4. If no shards are relocated, you might need to adjust the OpenSearch/Elasticsearch settings to permit the allocation of at least two shards per node. This is particularly crucial if you're in the process of removing several nodes from the cluster. A higher number of shards per node may be necessary to facilitate this process effectively.

    curl -XPUT 'localhost:9200/sessions*/_settings' -d '{
      "index.routing.allocation.total_shards_per_node": 2
    }'
  5. If a large number of shards require redistribution, the default settings might result in the process taking several days, which can be beneficial for maintaining cluster stability, but annoying. To expedite the process, consider increasing the transfer rate from the default setting of 3 streams at 20 MB (totaling 60 MB/sec) to a higher throughput, such as 6 streams at 50 MB (300 MB/sec). Make adjustments based on the disk and network capabilities of the new nodes.
    curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{
      "indices.recovery.concurrent_streams":6,
      "indices.recovery.max_bytes_per_sec":"50mb"}
    }'

How to enable OpenSearch/Elasticsearch replication

Turning on replication will consume twice the disk space on the nodes and increase the network bandwidth between nodes.

To change future days, run the following command:

db/db.pl <http://ESHOST:9200> upgrade --replicas 1

To change past days but not the current day, run the following command:

db/db.pl <http://ESHOST:9200> expire <type> <num> --replicas 1

We recommend the second solution because it allows current traffic to be written to OpenSearch/Elasticsearch once, and during off peak the previous day's traffic will be replicated.

How do I upgrade OpenSearch/Elasticsearch?

In general, if upgrading between minor or build versions of Elasticsearch, you can perform a rolling upgrade with no issues. Follow Elastic's instructions for the best results. Make sure you select the matching version of that document for your version of Elasticsearch from the dropdown menu on the right side of the screen.

Upgrading between major versions of Elasticsearch usually requires an upgrade of Arkime. See the following instructions:

How do I upgrade to Elasticsearch 8.x?

Elasticsearch 8.x is NOT supported before 3.4.1, and we recommend that you use Arkime 4.x.

  1. You must first upgrade to Arkime 3.4.1 or higher and run db.pl http://ESHOST:9200 upgrade while still using Elasticsearch 7.
  2. Elasticsearch 8 can only perform upgrades from Elasticsearch 7.17 or later, so you will need to upgrade Elasticsearch to 7.17 or later.
  3. Make sure your Elasticsearch configuration files are ready for Elasticsearch 8. We do not provide sample Elasticsearch configurations, but here are some things to look out for:
    • By default, Elasticsearch 8 enables HTTPS and passwords, make sure you update your Arkime configuration file to use them.
    • There are several configuration variable changes. We suggest trying your elasticsearch.yml configuration file with a test cluster.
    • You may want to read the Elasticsearch 8.0 breaking changes.
  4. Follow Elastic's rolling upgrade instructions.

How do I upgrade to Elasticsearch 7.x?

  1. If you are NOT using Arkime DB version 63 (or later), you must follow these instructions while still using Elasticsearch 6.x and upgrade to Moloch 2.x. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Arkime.
  2. Make sure your Elasticsearch configuration files are ready for Elasticsearch 7. We do not provide sample Elasticsearch configurations, but here are some things to look out for:
  3. Now you need to upgrade from Elasticsearch 6 to Elasticsearch 7. There are two options:
    1. Upgrading to Elasticsearch 7 if using Elasticsearch 6.8.6 (or later) can be done with a rolling upgrade. Follow Elastic's instructions for the best results. You do NOT need to stop capture/viewer, but after the rolling upgrade is finished, you may want to restart capture everywhere.
    2. If you are not using Elasticsearch 6.8.6, or if you would prefer to perform a full restart, follow the instructions below:
      1. Make sure you delete any old indices that db.pl notified you about when you installed Moloch 2.x.
      2. Shut down everything: Elasticsearch, capture, and viewer.
      3. Upgrade Elasticsearch to 7.x (7.8.0 or later is recommended).
      4. Start the Elasticsearch cluster.
      5. Wait for the cluster to go GREEN. This will take LONGER than usual as Elasticsearch upgrades indices from the 6.x to the 7.x format.
        curl http://localhost:9200/_cat/health
      6. Start viewers and captures.

How do I upgrade to Elasticsearch 6.x?

Elasticsearch 6.x is supported by Moloch 1.x for NEW clusters and >= 1.5 for UPGRADING clusters.

NOTE – If upgrading, you must FIRST upgrade to Moloch 1.0 or 1.1 (1.1.1 is recommended) before upgrading to > 1.5. Also, all reindex operations need to be finished.

We do NOT provide Elasticsearch 6 startup scripts or configuration, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to Elasticsearch 6 will REQUIRE two downtimes.

First outage: If you are NOT using Moloch DB version 51 (or later), you must follow these steps while still using Elasticsearch 5.x. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Moloch.

Second outage: Upgrade to Elasticsearch 6.

How do I upgrade to Elasticsearch 5.x?

Elasticsearch 5.x is supported by Moloch 0.17.1 for NEW clusters and 0.18.1 for UPGRADING clusters.

Elasticsearch 5.0.x, 5.1.x, and 5.3.0 are NOT supported because of Elasticsearch bugs/issues. We currently use 5.6.7.

WARNING – If you have sessions-* indices created with Elasticsearch 1.x, you can NOT upgrade. Those indices will need to be deleted.

We do NOT provide Elasticsearch 5 startup scripts, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to Elasticsearch 5 may REQUIRE 2 downtime periods of about 5–15 minutes each.

First outage: If you are NOT using Moloch DB version 34 (or later), you must follow these steps while still using Elasticsearch 2.4. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Moloch.

Second outage: Upgrade to Elasticsearch 5.

version conflict, current version [N] is higher or equal to the one provided [M]

This error usually happens when the capture process is trying to update the stats data and falls behind. Arkime will continue to function while this error occurs with the stats or dstats index; however, it does usually mean that your Elasticsearch cluster is overloaded. You should consider increasing your Elasticsearch capacity by adding more nodes, CPU, and/or more memory. If increasing Elasticsearch capacity isn't an option, then reduce the amount of traffic that Arkime processes.
If the N vs. M version numbers are very different from each other, it usually means that you are running two nodes with the same node name at the same time, which is not supported.

Here are some of our recommended OpenSearch/Elasticsearch settings. Many of these can be updated on the fly, but it is still best to put them in your elasticsearch.yml file. We strongly recommend using the same elasticsearch.yml file on all hosts. Things that need to be different per host can be set with variables.

Disk Watermark

You will probably want to change the watermark settings so you can use more of your disk space. You have the option to use ALL percentages or ALL values, but you can't mix them. The most common sign of a problem with these settings is an error that has FORBIDDEN/12/index read-only / allow delete in it. You can use ./db.pl http://ESHOST:9200 unflood-stage _all to clear the error once you adjust the settings and/or delete some data. Elasticsearch Docs

cluster.routing.allocation.disk.watermark.low: 97%
cluster.routing.allocation.disk.watermark.high: 98%
cluster.routing.allocation.disk.watermark.flood_stage: 99%

Or, if you want more control, use values instead of percentages:

cluster.routing.allocation.disk.watermark.low: 300gb
cluster.routing.allocation.disk.watermark.high: 200gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb

Shard Limit

If you have a lot of shards that you want to be able to search against at once Elasticsearch Docs

action.search.shard_count.limit: 100000

Write Queue Limit

No longer need to set since Elasticsearch 7.9. If you hit a lot of bulk failures, this can help, but Elastic doesn't recommend raising too much. In older versions of Elasticsearch, it is named thread_pool.bulk.queue_size, so check the docs for your version. Elasticsearch Docs

thread_pool.write.queue_size: 10000

HTTP Compression

On by default in most versions, allows for HTTP compression. Elasticsearch Docs

http.compression: true

Recovery Time

To speed up recovery times and startup times, there are a few controls to experiment with. Make sure you test them in your environment and slowly increase them because they can break things badly. Elasticsearch Allocation Docs and Elasticsearch Recovery Docs

cluster.routing.allocation.cluster_concurrent_rebalance: 10
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.node_initial_primaries_recoveries: 5
indices.recovery.max_bytes_per_sec: "400mb"

Logging

By default, Elasticsearch has logging set to debug level in log4j2.properties. For busy clusters, change this to info level to lower CPU and disk usage.


logger.action.level = info

Using ILM/ISM with Arkime

Since Moloch 2.2, you can easily use ILM to move indices from hot to warm, force merge, and delete. We recommend only using ILM with newer versions (7.2+) of Elasticsearch because older versions had some issues. Once ILM is enabled, you no longer have to use the db.pl expire cron job but should occasionally run db.pl optimize-admin.

ILM is only included in the free "basic" Elasticsearch license, so it is not part of the Elasticsearch OSS distribution, and you may need to upgrade. Arkime does NOT currently support the ILM auto rollover feature, for performance reasons, when searching.

These instructions assume you are using db.pl or Arkime UI to set up ILM and will use a special molochtype attribute name. You can also do this with Kibana to create the ILM config and not use the molochtype attribute name, but you will then need to do everything on your own. In order for ILM work correctly with Arkime, follow these five important steps:

  1. If you are using a hot/warm design or might in the future, for each Elasticsearch node, add a line to your elasticsearch.yml file with node.attr.molochtype: warm or node.attr.molochtype: hot
  2. Create the molochsessions and molochhistory ILM polices. This can be done easily with Kibana, or we recommend the db.pl ilm command.
  3. Assign the molochsessions and molochhistory ILM polices to all the existing indices. Kibana or db.pl ilm can perform this action.
  4. Change the moloch templates to use the ILM polices for NEW indices. You'll need to rerun db.pl upgrade ... --ilm and add --ilm to the command. Also add --hotwarm if using a hot/warm design.
  5. Replace the previous db.pl expire cron job with db.pl optimize-admin

So for example, to create a new policy that keeps 30 weeks of history, 90 days of SPI data, 1 replica, and optimizes all indices older than 25 hours, you would run: ./db.pl http://localhost:9200 ilm 25h 90d --history 30 --replicas 1 You would then need to run upgrade with all the arguments you usually use, plus --ilm: ./db.pl http://localhost:9200 upgrade --replicas 1 --shards 5 --ilm


Capture

What kind of capture machines should I buy?

Arkime prioritizes cost-effective solutions by utilizing standard hardware components. Consider carefully before opting for upgrades like SSDs or high-end network cards. In some cases, adding another server might be a more economical choice. This approach not only reduces individual machine costs but also enhances overall data retention capacity.

Choosing the right machine for the job requires careful consideration. Here are some key factors to keep in mind:

When selecting Arkime capture boxes, standard "Big Data" boxes might be the best bet ($10k–$25k each). Look for:

We are big fans of using network packet brokers (NPBs) ($6k+). They allow multiple taps/mirrors to be aggregated and load balanced across multiple capture machines. Read more in the following sections.

What kind of NPB should I buy?

We are big fans of using NPBs, and we recommend that medium or large Arkime deployments use an NPB. See MolochON 2017 NPB Presentation .

Main advantages:

Features to look for:

Just like with Arkime, with commodity hardware, you don't necessarily have to pay a lot of money for a good NPB. Some switch vendors offer switches that can operate in switch mode or NPB mode, so you might already have gear you can use.

Sample vendors

What kind of packet capture speeds can arkime-capture handle?

On modern commodity hardware, achieving throughput of 5 Gbps or more is easy, depending largely on the number of CPUs allocated to Arkime and the other tasks the machine is handling. The bottleneck in performance is almost always the speed of storing PCAP to disk! If your disks or RAID can't keep up, you either need to not save as much PCAP using Arkime Rules and other options, select a faster RAID configuration and disks, or give Arkime dedicate disks. For further details, refer to the Architecture and Multiple Host sections. Arkime supports the utilization of multiple threads for both packet acquisition and packet processing.

A simple method to test a local RAID devicee:

dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
To test a NAS, leave off the oflag=direct and make sure you test with at least 3x the amount of memory so that cache isn't a factor:
dd bs=256k count=150000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test

The output represents the maximum disk performance. If you wish to obtain a more accurate assessment, run several tests and average the results. To avoid packet loss, it's advisable to operate Arkime at no more than 80% of the maximum disk performance. For systems utilizing RAID, aiming for about 60% of this performance metric can further minimize issues, especially during RAID rebuilds. It's important to note that network throughput is typically measured in bits, whereas disk performance is gauged in bytes, requiring the conversion of these measurements for accurate comparison.

Arkime requires full packet captures error

When you have an error about the capture length not matching the packet length, it is NOT an issue with Arkime. The issue is with the network card settings OR how the pcap file was created.

By default modern network cards offload work that the CPUs would need to do. They will defragment packets or reassemble tcp sessions and pass the results to the host. However this is NOT what we want for packet captures, we want what is actually on the network. So you will need to configure the network card to turn off all the features that hide the real packets from Arkime.

The sample config file (/opt/arkime/bin/arkime_config_interfaces.sh) turns off many common features but there are still some possible problems:

  1. If using containers or VMs for Arkime, you may need to turn off the features on the physical interface the VM interface is mapped to from the host OS, instead of inside the container/VM.
  2. If using a fancy card there may be other features that need to be turned off.
    1. You can find them usually with ethtool -k INTERFACE | grep on — Anything that is still on, turn off and see if that fixes the problem. Items that says [fixed] can NOT be disabled with ethtool.
    2. For example ethtool -K INTERFACE tx off sg off gro off gso off lro off tso off
  3. Sometimes switching to the Arkime AFPacket read method will fix the issue.

Workarounds:

  1. For offline pcap files, make sure they were created without a snaplen, for example with tcpdump use -s 0.
  2. For offline pcap files, you can set readTruncatedPackets=true in the config file, most protocol parsing in Arkime will not work.
  3. (Since 5.0.1) For live pcap capture, you can set readTruncatedPackets=true in the config file.
  4. For both offline and live capture you can increase the max packet length with snapLen=65536 in the config file, this is not recommended.

Why am I dropping packets? (and Disk Q issues)

There are several different types of packet drops and reasons for packet drops:

Arkime Version

Please make sure you are using a recent version of Arkime. Constant improvements are made and it is hard for us to support older versions.

Kernel and TPACKET_V3 support

The most common cause of packet drops with Arkime is leaving the reader default of libpcap instead of switching to tpacketv3, pfring or one of the other high performance packet readers. We strongly recommend tpacketv3. See plugin settings for more information.

Network Card Config

Make sure the network card is configured correctly by increasing the ring buf to max size and turning off most of the card's features. The features are not useful anyway, since we want to capture what is on the network instead of what the local OS sees. Example configuration:

# Set ring buf size, see max with ethool -g eth0
ethtool -G eth0 rx 4096 tx 4096
# Turn off feature, see available features with ethtool -k eth0
ethtool -K eth0 rx off tx off gs off tso off gso off

If Arkime was installed from the deb/rpm and the Configure script was used, this should already be done in /data/moloch/bin/moloch_config_interfaces.sh

packetThreads and the PacketQ is overflowing error

The packetThreads config option controls the number of threads processing the packets, not the number of threads reading the packets off the network card. You only need to change the value if you are getting the Packet Q is overflowing error. The packetThreads option is limited to 24 threads, but usually you only need a few. Configuring too many packetThreads is actually worse for performance, please start with a lower number and slowly increase. You can also change the size of the packet queue by increasing the maxPacketsInQueue setting.

To increase the number of threads the reader uses please see the documentation for the reader you are using on the settings page.

Disk and Disk Q issues

In general errors about the Disk Q being exceeded are NOT a problem with Arkime, but usually an issue with either the hardware or the packet rate exceeding what the hardware can save to disk. You will usually need to either fix/upgrade the hardware or reduce the amount of traffic being saved to disk.

  • Make sure swap has been disabled, swappiness is 0, or at the very least, isn't writing to the disk being used for PCAP.
  • Make sure the RAID isn't in the middle of a rebuild or something worse. Most RAID cards will have a status of OPTIMAL when things are all good and DEGRADED or SUBOPTIMAL when things are bad.
  • To test the RAID device use:
    dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
    This is the MAX disk performance. Run several times if desired and take the average. If you don't want to drop any packets, you shouldn't average more then ~80% of the MAX disk performance. If using RAID and don't want drop packets during a future rebuild, ~60% is a better value. Remember that most network numbers will be in bits while the disk performance will be in bytes, so you'll need to adjust the values before comparing.
  • Make sure you actually have enough disk write thru capacity and disks. For example, for a 1G link with RAID 5 you may need:
    • At least 4 spindles if using a RAID 5 card with write cache enabled.
    • At least 8 spindles (or more) if using a RAID 5 card with write cache disabled.
  • Make sure your RAID card can actually handle the write rate. Many onboard RAID 5 controllers can not handle sustained 1G write rates.
  • Switch to RAID 0 from RAID 5 if you can live with the TOTAL data loss on a single disk failure.
  • If you are using xfs make sure you use mount options defaults,inode64,noatime
  • Don't run capture and OpenSearch/Elasticsearch on the same machine.

If using EMC for disks:

  • Make sure write cache is enabled for the LUNs.
  • If it is a CX with SATA drives, RAID-3 is optimized for large sequential I/O.
  • Monitor EMC lun queue depth, may be too many hosts sharing it.

To check your disk IO run iostat -xm 5 and look at the following:

  • wMB/s will give you the current write rate, does it match up with what you expect?
  • avgqu-sz should be near or less then 1, otherwise linux is queueing instead of doing
  • await should be near or less then 10, otherwise the IO system is slow, which will slow capture down.

Other things to do/check:

  • If using RAID 5 make sure you have write cache enabled on the RAID card. Sometimes this is called WriteBack. Make sure the BBU is still good or you have write cache enabled even when the BBU isn't working or missing.
    • Adaptec Example: arcconf SETCACHE 1 LOGICALDRIVE 1 WBB
    • HP Example: hpssacli ctrl slot=0 modify dwc=enable
    • MegaCLI: MegaCli64 -LDSetProp -ForcedWB -Immediate -Lall -aAll ; MegaCli64 -LDSetProp Cached -L0 -a0 -NoLog

Other

  • There are conflicting reports that disabling irqbalancer may help.
  • Check that the CPU you are giving capture isn't handling lots of interrupts (cat /proc/interrupts).
  • Make sure other processes aren't using the same CPU as capture.

WISE

  • Cyclical packet drops may be caused by bad connectivity to the wise server. Verify that the WISE responds quickly
    curl http://arkime-wise.hostname:8081/views
    on the capture host that is dropping packets.

High Performance Settings

See settings

How do I import existing PCAPs?

Think of the capture binary much like you would tcpdump. The capture binary can listen to live network interface(s), or read from historic packet capture files. Currently Arkime works best with PCAP files, not PCAPng.

/opt/arkime/bin/capture -c [config_file] -r [PCAP file]

For an entire directory, use -R [PCAP directory]

See /opt/arkime/bin/capture --help for more info. The --monitor to monitor non NFS directories, --skip to skip already loaded PCAP files, and -R to process directories are common options. Multiple -r and -R options can be used.

If Arkime is failing to load a PCAP file check the following things:

By default importing offline pcap does NOT make a copy of the pcap file, Arkime saves a reference to the original file, which shows up as locked on the files tab. If you want to make a copy of the pcap file, use the --copy option with capture.

Enable Arkime UI to upload

It is also possible to enable UI in Arkime to upload PCAP. This is less efficient then just using capture directly, since it uploads the file and then runs capture for you. Just uncomment the uploadCommand in the config.ini file.

How do I monitor multiple interfaces?

The easy way is using the interface setting in your config.ini. It supports a semicolon ';' separated list of interfaces to listen on for live traffic. If you want to set a tag or another field per interface, use the interfaceOps setting.

The hard way, you can also have multiple capture processes,.

You only need to run one viewer on the machine. Unless it is started with the -n option, it will still use the hostname as the node name, so any special settings need to be set there (although default is usually good enough).

Arkime capture crashes

Please file an issue on github with the stack trace.

If it is easy to reproduce, sometimes it's easier to just run gdb as root:

ERROR - pcap open failed

Usually capture is started as root so that it can open the interfaces and then it immediately drops privileges to dropUser and dropGroup, which are by default nobody:daemon. This means that all parent directories need to be either owned or at least executable by nobody:daemon and that the pcapDir itself must be writeable.

How to reduce amount of traffic/pcap?

Listed in order from highest to lowest benefit to Arkime

  1. Setting the bpf= filter will stop Arkime from seeing the traffic.
  2. Adding CIDRs to the packet-drop-ips section will stop Arkime from adding packets to the PacketQ
  3. Using Rules it is possible to control if the packets are written to disk or the SPI data is sent to OpenSearch/Elasticsearch

Life of a packet

Arkime capture supports many options for controlling which packets are captured, processed, and saved to disk.

PCAP Deletion

Arkime does NOT support having pcapDir and the OpenSearch/Elasticsearch data directory on the same file system. Arkime will NOT work in this configuration as the tools fight for space.

PCAP deletion is handled by the viewer process, so it is important that viewer is running on all capture instances. The viewer process checks on startup and then every minute to see how much space is available. When free space is below freeSpaceG, then viewer will start deleting the oldest PCAP files. The viewer process logs every time a file is deleted, so it is possible to figure out when a file is deleted if needed. If viewer complains about not finding the PCAP file, make sure you check the viewer.log for errors.

Note: freeSpaceG can be a number freeSpaceG=1000 or a percentage, with freeSpaceG=5% the default. The viewer process will always leave at least 10 PCAP files on the disk, so make sure there is room for at least maxFileSizeG * 10 capture files on disk, or by default 120G.

If still having PCAP delete issues:

  1. Verify freeSpaceG is set correctly for the node in the config file
  2. Restart viewer after turning on debugging by adding debug=2 in the [default] section of your config.ini file. After restarting viewer, check the viewer.log for messages or use grep -i expire /opt/arkime/logs/viewer.log to see the relevant messages
  3. Make sure there is free space where viewer is writing its logs if you don't see any messages in viewer.log
  4. Verify that dropUser or dropGroup can actually delete files in the PCAP directory and has read/execute permissions in all parent directories. So for example you need to check the /opt and the /opt/arkime and the /opt/arkime/raw directory permissions. The PCAP files will have read/write permissions which is normal.
  5. Make sure the PCAP directory is on a filesystem with at least maxFileSizeG * 10 space available.
  6. If there is a mismatch between the files in the directory and the files on the Files tab run the db.pl http://localhost:9200 sync-files command
  7. Make sure the files in the file tab don't have locked set, viewer won't deleted locked files
  8. Try restarting viewer
  9. If using seilnux (sestatus) temporarily disable it (setenforce 0) and see if that fixes the problem.

dontSaveBPFs doesn't work

There are several common reasons dontSaveBPFs might not work for you.

  1. Look at the saved PCAP, not the packet count in the UI, Arkime will still count the number of packets, it just won't save them
  2. Make sure you've spelled it dontSaveBPFs, case matters
  3. Make sure you've placed dontSaveBPFs in the correct section, you can verify by adding a --debug to capture when starting and looking at the output
  4. Turns out BPF filters are tricky. :) When the network is using vlans, then at compile time, BPFs need to know that fact. So instead of a nice simple dontSaveBPFs=tcp port 443:10 use something like dontSaveBPFs=tcp port 443 or (vlan and tcp port 443):10. Basically FILTER or (vlan and FILTER). Information from here.
  5. Try testing your filter manually with tcpdump, you should only see the traffic you want to drop. So something like tcpdump -i INTERFACE tcp port 443 for example.

If still having issues, you might just try out a Arkime Rules file. Arkime converts dontSaveBPFs into a rule for you behind the scenes, so Arkime Rules are actually more powerful.

Zero or missing bytes PCAP files

Arkime optimizes disk writes for efficiency, making it highly suitable for high bandwidth networks. However, this approach might not be ideal for low bandwidth environments. The amount of data Arkime buffers before writing to disk is determined by the pcapWriteSize setting, which has a default value of 262144 bytes. It's crucial to remember that this buffering occurs on a per-thread basis. Therefore, for low bandwidth networks, it's advisable to set packetThreads to 1 (a single thread). The system is designed to flush the buffered PCAP to disk after 10 seconds of inactivity, but direct-io requires pagesize bytes to still be buffered, typically around 4096 bytes.

Encountering an error message such as ERROR - processSessionIdDisk - SESSIONID in file FILENAME couldn't read packet at FILEPOS packet # 0 of 2 or Not enough data 0 for header 16 usually indicates that the PCAP data is either still in the process of buffering and requires more time to be fully written to disk, or it suggests that a capture process or system crash occurred before the PCAP data could be saved. It may be useful to turn OFF compression simpleCompression=none on low bandwidth networks since compression causes more data to be buffered.

Note, running out of disk space can lead to the creation of numerous zero-byte PCAP files. For details on managing disk space and preventing such issues, refer to PCAP Deletion.

Can I virtualize Arkime with KVM using OpenVswitch?

In small environments with low amounts of traffic this is possible. With Openvswitch you can create mirror port from a physical or virtual adapter and send the data to another virtual NIC as the listening interface. In KVM, one issue is that it isn't possible to increase the buffer size past 256 on the adapter using the Virtio network adapter (mentioned in another part of the FAQ). Without Arkime capture will continuously crash. To solve this in KVM, use the E1000 adapter, and configure the buffer size accordingly. Set up the SPAN port on Openvswitch to send traffic to it: https://www.rivy.org/2013/03/configure-a-mirror-port-on-open-vswitch/.

Installing MaxMind Geo free database files

MaxMind recently changed how you download their free database files. You now need to signup for an account and setup the geoipupdate program. If using a version of Moloch before 2.2, you will need to edit your config.ini file and update the geolite paths.

Instructions:

  1. Sign up for a MaxMind account (no purchase required)
  2. Wait for MaxMind email and set your password
  3. Install the geoipupdate tool, pay attention to version installed, for many distributions you can just do a yum install geoipupdate or apt install geoipupdate
  4. Create a license key
  5. Select Yes when asked "Will this key be used for GeoIP Update?" and select the version you have
  6. Use the MaxMind feature to generate a config file for you, usually you will replace /etc/GeoIP.conf with this file
  7. Run geoipupdate as root and see if it works
  8. If you are using Moloch before 2.2, update your /data/moloch/etc/config.ini file so that geoLite2Country is now /usr/share/GeoIP/GeoLite2-Country.mmdb and geoLite2ASN is now /usr/share/GeoIP/GeoLite2-ASN.mmdb
  9. Restart capture

What do these log lines mean?

Arkime logs a lot of information for debugging purposes. Much of this information is for bug reports, but can also be used to figure out what is going on. You may need to use --debug to enable these msgs.

HTTP Responses

Jan 01 01:01:01 http.c:369 moloch_http_curlm_check_multi_info(): 8000/30 ASYNC 200 http://eshost:9200/_bulk 250342/5439 14ms 12345ms
Jan 01 01:01:01Date
http.c:369File Name:Line Number
moloch_http_curlm_check_multi_infoFunction Name
8000/308000 queued requests to server
30 connections to server
ASYNCAsynchronous request, SYNC for Synchronous request
200HTTP status code
http://eshost:9200/_bulkRequested URL
250032/5439250342 bytes uploaded (CURLINFO_SIZE_UPLOAD)
5439 bytes downloaded (CURLINFO_SIZE_DOWNLOAD)
14ms14ms to connect to server (CURLINFO_CONNECT_TIME)
12345ms12345ms total request time (CURLINFO_TOTAL_TIME)

Periodic Packet Progress

Jan 01 01:01:01 packet.c:1185 moloch_packet_log(): packets: 3911000000 current sessions: 41771/45251 oldest: 0 - recv: 4028852297 drop: 123 (0.00) queue: 1 disk: 2 packet: 3 close: 4 ns: 5 frags: 0/1988 pstats: 4132185901/1/2/3/4/5/6
Jan 01 01:01:01Date
packet.c:1185File Name:Line Number
moloch_packet_logFunction Name
packets: 39110000003911000000 packets are going to be processed by the packet queues. These packets have made it past corrupt checks, packet-drop-ips checks, and are ones we most likely understand.
current session: 41771/4525141771 monitored sessions of the current session type (usually tcp)
45251 monitored sessions total
oldest: 0In the current session type queue, the oldest session should be idled out in 0 seconds
recv: 40288522974028852297 packets have been received by the interface since capture start, as reported by the reader's stats api
drop: 123123 packets have been dropped by the interface since capture started, as reported by the reader's stats api
(0.00)0.00% packets have been dropped by the interface since capture started, as reported by the reader's stats api
queue: 11 bulk request is waiting to be sent to the OpenSearch/Elasticsearch servers, each bulk request may hold multiple sessions
disk: 22 disk buffers writes are outstanding, each buffer will hold multiple packets
packet: 33 packets are waiting to be processed in all the packet queues
close: 44 tcp sessions have been marked for closing (RST/FIN), waiting on last few packets
ns: 55 sessions are ready to be saved but there is a plugin that is doing async work, such as WISE
frags: 0/1988always 0
1988 current ip frags waiting to be matched
pstats: 4132185901/1/2/3/4/5/6 4132185901 packets successfully sent to a packet queue
1 packet dropped because of packet-drop-ips config
2 packets dropped because the packet queues were overloaded
3 packets dropped because they were corrupt
4 packets dropped because how to process was unknown to us
5 packets dropped because of ipport rules
6 packets dropped because of packet deduping (2.7.1 enablePacketDedup)


Viewer

Where do I learn more about the expressions available

Click on the owl and read the Search Bar section. The Fields section is also useful for discovering fields you can use in a search expression.

Exported PCAP files are corrupt, sometimes session detail fails

The most common cause of this problem is that the timestamps between the Arkime machines are different. Make sure ntp is running everywhere, or that the time stamps are in sync.

Map counts are wrong

What browsers are supported?

Recent versions of Chrome, Firefox, and Safari should all work fairly equally. Below are the minimum versions required. We aren't kidding.

Arkime Version Chrome Firefox Opera Safari Edge IE
Prior to 3.0 53 54 40 10 14 Not Supported
3.x, 4.x 80 74 67 13.1 80 Not Supported
5.x and beyond 92 95 78 15.4 92 Not Supported

Development and testing is done mostly with Chrome on a Mac, so it gets the most attention.

Error: getaddrinfo EADDRINFO

This seems to be caused when proxying requests from one viewer node to another and the machines don't use FQDNs for their hostnames and the short hostnames are not resolvable by DNS. You can check if your machine uses FQDNs by running the hostname command. There are several options to resolve the error:

  1. Use the --host option on capture
  2. Configure the OS to use FQDNs.
  3. Make it so DNS can resolve the shortnames or add the shortnames to the hosts file.
  4. Edit config.ini and add a viewUrl for each node. This part of the config file must be the same on all machines (we recommend you just use the same config file everywhere). Example:
    [node1_eth0]
    interface=eth0
    viewUrl=http://node1.fqdn
    [node1_eth1]
    interface=eth1
    viewUrl=http://node1.fqdn
    [node2]
    interface=eth1
    viewUrl=http://node2.fqdn

How do I proxy Arkime using Apache

Apache, and other web servers, can be used to provide authentication or other services for Arkime when setup as a reverse proxy. When a reverse proxy is used for authentication it must be inline, and authentication in Arkime will not be used, however Arkime will still do the authorization. Arkime will use a username that the reverse proxy passes to Arkime as a HTTP header for settings and authorization. See the architecture page for diagrams. While operators will use the proxy to reach the Arkime viewer, the viewer processes still need direct access to each other.

I still get prompted for password after setting up Apache auth

  1. Make sure the user has the "Web Auth Header" checked
  2. Make sure in the viewer config userNameHeader is the lower case version of the header Apache is using.
  3. Run viewer.js with a --debug and see if the header is being sent.

How do I search multiple Arkime clusters

It is possible to search multiple Arkime clusters by setting up a special Arkime MultiViewer and a special MultiES process. The MultiES process is similar to Elasticsearch tribe nodes, except it was created before tribe nodes and can deal with multiple indices having the same name. The MultiViewer talks to MultiES instead of a real OpenSearch/Elasticsearch instance. Currently one big limitation is that all Arkime clusters must use the same serverSecret.

To use MultiES, create another config.ini file or section in a shared config file. Both multies.js and the special "all" viewer can use the same node name. See Multi Viewer Settings for more information.

# viewer/multies node name (-n allnode)
[allnode]
# The host and port multies is running on, set with multiESHost:multiESPort usually just run on the same host
elasticsearch=127.0.0.1:8200
# This is a special multiple arkime cluster viewer
multiES=true
# Port the multies.js program is listening on, elasticsearch= must match
multiESPort = 8200
# Host the multies.js program is listening on, elasticsearch= must match
multiESHost = localhost
# Semicolon list of OpenSearch/Elasticsearch instances, one per arkime cluster.  The first one listed will be used for settings
# You MUST have a name set
multiESNodes = http://escluster1.example.com:9200,name:escluster1,prefix:PREFIX;http://escluster2.example.com:9200,name:escluster2
# Uncomment if not using different rotateIndex settings
#queryAllIndices=false

Now you need to start up both the multies.js program and viewer.js with the same config file AND -n allnode. All other viewer settings, including webBasePath can still be used.

By default, the users table comes from the first cluster listed in multiESNodes. This can be overridden by setting usersElasticsearch and optionally usersPrefix in the multi viewer config file.

How do I use self-signed SSL/TLS Certificates with MultiES?

Since 4.2.0 MultiES supports the caTrustFile setting.

Priority to 4.2.0 you will need to create a file, for example CAcerts.pem, containing one or more trusted certificates in PEM format.

Then, you need start MutilES adding NODE_EXTRA_CA_CERTS environment variable specifying the path to file you just created, for example:

NODE_EXTRA_CA_CERTS=./CAcerts.pem /opt/arkime/bin/node multies.js -c /opt/arkime/etc/config.ini -n allnode

How do I reset my password?

An admin can change anyone's password on the Users tab by clicking the Settings link in the Actions column next to the user.

A password can also be changed by using the addUser script, which will replace the entire account if the same userid is used. All preferences and views will be cleared, so creating a secondary admin account may be a better option if you need to change an admin users password. After creating a secondary admin account, change the users password and then delete the secondary admin account.

node addUser -c <configfilepath> <user id> <user friendly name> <password> [--admin]

Error: Couldn't connect to remote viewer, only displaying SPI data

Viewers have the ability to proxy traffic for each other. The ability relies on Arkime node names that are mapped to hostnames. Common problems are when systems don't use FQDNs or certs don't match.

How do viewers find each other

First the SPI records are created on the capture side.

  1. Each capture gets a nodename, either by the -n command line option or everything in front of the first period of the hostname.
  2. Each capture writes a stats record every few seconds that has the mapping from the nodename to the FDQN. It is possible to override the FDQN with the --host option to capture.
  3. Each SPI record has a nodename in it.

When PCAP is retrieved from a viewer it uses the nodename associated with the SPI record to find which capture host to connect to.

  1. Each arkime-viewer process gets a nodename, either by the -n command line option or everything in front of the first period of the hostname.
  2. If the SPI record nodename is the same as the arkime-viewer nodename it can be processed locally, STOP HERE. This is the common case with one arkime node.
  3. If the stats[nodename].hostname is the same as the arkime-viewer's hostname (exact match) then it can be processed locally, STOP HERE. Remember this is written by capture above, either the FQDN or --host. This is the common case with multiple capture processes per capture node.
  4. If we make it here, the PCAP data isn't local and it must be proxied.
  5. If there is a viewUrl set in the [nodename] section, use that.
  6. If there is a viewUrl set in the [default] section, use that.
  7. Use stats[nodename].hostname:[nodename section - viewPort setting]
  8. Use stats[nodename].hostname:[default section - viewPort setting]
  9. Use stats[nodename].hostname:8005

Possible fixes

First, look at viewer.log on both the viewer machine and the remote machine and see if there are any obvious errors. The most common problems are:

  1. Not using the same config.ini on all nodes can make things a pain to debug and sometimes not even work. It is best to use the same config with different sections for each node name [nodename]
  2. The remote machine doesn't return a FQDN from the hostname command AND the viewer machine can't resolve just the hostname. To fix this, do ONE of the following:
    1. Use the --host option to capture and restart capture
    2. Make it so the remote machines returns a FQDN (hostname "fullname" as root and edit /etc/sysconfig/network)
    3. Set a viewUrl in each node section of the config.ini. If you don't have a node section for each host, you'll need to create one.
    4. Edit /etc/resolv.conf and add search foo.example.com, where foo.example.com is the subdomain of the hosts. Basically, you want it so "telnet shortname 8005" works on the viewer machine to the remote machine.
  3. The remote machine's FQDN doesn't match the CN or SANs in the cert it is presenting. The fixes are the same as #2 above.
  4. The remote machine is using a self signed cert. To fix this, either turn off HTTPS or see the certificate answer above.
  5. The remote machine can't open the PCAP. Make sure the dropUser user or dropGroup group can read the PCAP files. Check the directories in the path too.
  6. Make sure all viewers are either using HTTPS or not using HTTPS, if only some are using HTTPS then you need to set viewUrl for each node.
    1. When troubleshooting this issue, it is sometimes easier to disable HTTPS everywhere
  7. If you want to change the hostname of a capture node:
    1. Change your mind :)
    2. Reuse the same node name as previously with a -n option
    3. Use the viewUrl for that old node name that points to the new host.

Compiled against a different Node.js version error

Arkime uses Node.js for the viewer component, and requires many packages to work fully. These packages must be compiled with and run using the same version of Node.js. An error like …​ was compiled against a different Node.js version using NODE_MODULE_VERSION 48. This version of Node.js requires NODE_MODULE_VERSION 57. means that the version of Node.js used to install the packages and run the packages are different.

This shouldn't happen when using the prebuilt Arkime releases. If it does, then double check that /opt/arkime/bin/node is being used to run viewer.

If you built Arkime yourself, this usually happens if you have a different version of node in your path. You will need to rebuild Arkime and either:

How do I change the port viewer listens on?

By default viewer listens on port 8005. Changing this can be tricky, especially for a port less than 1024, like 443. You should definitely read the How do viewers find each other section.

Scenario Solutions
Change all nodes to port > 1024 Set viewPort in [default] section on ALL nodes
Change single node port < 1024, remaining nodes (if any) unchanged Usually unless a program runs as root it can NOT listen to ports less than 1024. Since viewer by default drops privileges before listening, even if you start as root, it isn't root anymore when trying to listen on the port. Possible solutions are:
  • Use a reverse proxy like Apache/Nginx. This is a great option for a central node that needs to be behind SSO, and all other nodes are blocked from users directly using
  • Use iptables to forward from new port to 8005. Something like
    iptables -t nat -I PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8005
  • Fool around with the systemd CAP_NET_BIND_SERVICE setting
  • Comment out the dropUser setting and change the viewPort setting in a [$nodename] section.
All nodes, port < 1024 Just don't. :) If you must, most of the solutions above will work, but don't do the reverse proxy solution since viewer nodes need to talk to each other WITHOUT external authentication.

Hunts not working

Hunts require a single viewer node be specified to actually coordinate the hunts. Since 4.3.1 the preferred method is to set cronQueries=auto in the [default] section of the config.ini file for nodes that should be eligible to run hunts. This will automatically select a single node to run hunts, and if that node goes down, another node will be selected. If using central viewers, it is recommended to only set cronQueries=auto on the central viewers.

You can also force hunts to run on a specific node by setting cronQueries=true in the [default] section of the config.ini file. You must only set cronQueries=true on one and only one node.

If cronQueries is properly set up on a single node, and hunts still aren't working, make sure the cronQueries node is running and checking in. You can check this on the Stats -> ES Nodes tab and/or check the viewer logs.

See more information about the cronQueries setting here.


Cont3xt

Cont3xt is not working

Here is the common check list:

  1. Check that Cont3xt integrations are configured and not disabled
    curl http://localhost:3218/settings#integrations
  2. Verify there are no errors in the /opt/arkime/logs/cont3xt.log file.
  3. Restart cont3xt after adding a debug=2 in the [cont3xt] section may print out useful information what is wrong.

Parliament

Sample Apache Config

Parliament is designed to run behind a reverse proxy such as Apache. Basically, you just need to tell Apache to send all root requests and any /parliament requests to the Parliament server.

ProxyPassMatch   ^/$ http://localhost:8008/parliament retry=0
ProxyPass        /parliament/ http://localhost:8008/parliament/ retry=0


WISE

WISE is not working

Here is the common check list:

  1. Check that WISE is running
    curl http://localhost:8081/fields
    You should see a list of fields that WISE knows about.
  2. Check in your config.ini file you've added
  3. Check that from the capture/viewer hosts you can reach the viewer hosts and there are no ACL issues.
    curl http://WISEHOST:8081/fields
  4. Restart capture after adding a --debug option may print out useful information what is wrong. Look to make sure that WISE is being called with the correct URL. Verify that the plugins, wiseHost and wiseURL setting is what you actually think it is.

arkime.com

How can I contribute?

Want to add or edit this FAQ? Found an issue on this site? This site's code is open source. Please contribute!

Arkime Logo