General

Why should I use Arkime?

If you're in search of a comprehensive, standalone open-source solution for full packet capture (FPC) and network analysis that includes both metadata parsing and searching capabilities, Arkime stands out as the premier choice. Full packet capture systems are indispensable tools for network and security analysts, offering an unfiltered view of network activities and enabling a detailed analysis of events from a network perspective. As an open-source platform, Arkime affords its users total control over both deployment and architecture, ensuring that you can tailor the system to meet your specific requirements. While there are other FPC systems available, Arkime's unique blend of features and open-source accessibility makes it a standout solution for those needing detailed and actionable network insights.

How do you pronounce our name?

(/ɑːrkɪˈmi/) Read more about why we changed our name here.

Upgrading Arkime

Upgrading Arkime is a sequential process that requires installing each major version in the order outlined in the table below. This step-by-step approach ensures that your system transitions smoothly between versions without missing critical updates or features. If you find that your current version of Arkime is not explicitly mentioned in the chart, it is recommended to upgrade to the immediately higher version listed. By following this method, you can sequentially upgrade through the major releases until your system is up-to-date. Typically, transitioning between versions requires the execution of the db.pl upgrade script. This script is designed to update your database schemas and configurations to be compatible with the newer version of Arkime. Unless specified otherwise in the upgrade documentation, running this script should be the only additional step needed to complete the upgrade process.

Name	Version	Min Version to Upgrade From	OpenSearch Versions	Elasticsearch Versions	Special Instructions	Notes
Arkime	6.0+	5.2.0+	1.x, 2.x, 3.x (2.10+ or 3.x recommended), 2.19.0 broken, 3.x requires Arkime 5.7.0+	7.10+, 8.x, 9.x (8.10+ or 9.x recommended)	Arkime 6.x instructions	Unreleased
Arkime	5.0+	4.3.2+ (4.6.0 recommended)	1.x, 2.x, 3.x (2.10+ or 3.x recommended), 2.19.0 broken, 3.x requires Arkime 5.7.0+	7.10+, 8+, 9.x (7.17+, 8.10+ recommended), 9.x requires Arkime 5.7.0+	Arkime 5.x instructions
Arkime	4.0+	3.3.0+ (3.4.0+ recommended)	1.x, 2.x (2.3+ recommended), 2.19.0 broken	7.10+, 8+	Arkime 4.x instructions
Arkime	3.0+	2.4.0	1.x, 2.x (2.3+ recommended), 2.19.0 broken	7.10+, not 8.x	Arkime 3.x instructions
Arkime	2.7+	2.0.0	N/A	7.4+ (7.9.0+ recommended, 7.7.0 broken)	Elasticsearch 7 instructions
Moloch	2.2+	1.7.0 (1.8.0 recommended)	N/A	6.8.2+ (6.8.6+ recommended), 7.1+ (7.8.0+ recommended, 7.7.0 broken)	Moloch 2.x instructions	Must already be on 6.8.x or 7.1+ before upgrading to 2.2
Moloch	2.0, 2.1	1.7.0 (1.8.0 recommended)	N/A	6.7, 6.8, 7.1+	Moloch 2.x instructions	Must already be on Elasticsearch 6.7 or 6.8 (Elasticsearch 6.8.6 recommended) before upgrading to 2.0
Moloch	1.8	1.0.0 (1.1.x recommended)	N/A	5.x or 6.x	Elasticsearch 6 instructions	Must have finished the 1.x reindexing; stop captures for best results
Moloch	1.1.1	0.20.2 (0.50.1 recommended)	N/A	5.x or 6.x (new only)	Instructions	Must be on Elasticsearch 5 already
Moloch	0.20.2	0.18.1 (0.20.2 recommended)	N/A	2.4, 5.x	Elasticsearch 5 instructions

What operating systems are supported?

Arkime is pre packaged to support a wide range of operating systems, available on the downloads page. The Arkime development team predominantly works with the EL 8 build, opting between the pcap and afpacket readers based on the specific needs of each deployment. You are encouraged to use the afpacket reader whenever possible to achieve the best capture performance. While a substantial portion of our development efforts takes place on macOS, leveraging the Homebrew package manager, it's important to note that this environment has not been vetted in a production setting. :) It's also worth mentioning that Arkime has phased out support for 32-bit machines, consequently, the software is incompatible with many lower-powered devices. Furthermore, our support is currently limited to LTS versions of Ubuntu, due to potential library compatibility issues with non-LTS releases.

The following operating systems distributions and versions are supported directly:

Amazon Linux 2 and EL 7 - will no longer be supported in Arkime 6
Amazon Linux 2023
Arch
Debian 12
EL 8 and 9
Ubuntu 20.04, 22.04, 24.04

Arkime is not working

Here is the common checklist to perform when diagnosing a problem with Arkime (replace /opt/arkime with /data/moloch for Moloch builds):

Check that OpenSearch/Elasticsearch is running and GREEN by using the curl command curl http://localhost:9200/_cat/health on the machine running OpenSearch/Elasticsearch. An Unauthorized response probably means that you need user:pass in all OpenSearch/Elasticsearch URLs or that you are using the wrong URL.
Check that the db has been initialized with the /opt/arkime/db/db.pl http://elasticsearch.hostname:9200 info command. You should see information about the database version and number of sessions.
Check that viewer is reachable by visiting http://arkime-viewer.hostname:8005 from your browser.
1. If it doesn't render, looks strange, or warns of an old browser, use a newer supported browser.
2. If the browser can't connect and you are sure viewer.js is running, verify there are no firewalls blocking access between your browser and the viewer host.
3. Make sure viewHost=localhost is NOT set in the config.ini file. Test that curl http://IP:8005 works from the host viewer is running on.
Check for errors in /opt/arkime/logs/viewer.log and that viewer is running with the pgrep -lf viewer command. If the UI looks strange or isn't working, viewer.log will usually have information about what is wrong.
Check for errors in /opt/arkime/logs/capture.log and that capture is running with the pgrep -lf capture command. If packets aren't being processed or other metadata generation issues, capture.log will usually have information about what is wrong and links to the FAQ on how to fix.
To check that the stats page shows the capture nodes you are expecting, visit http://arkime-viewer.hostname:8005/stats?statsTab=1 in your browser.
1. If the packets being received for any node is low, that node is having issues, please check its capture.log
2. If the timestamp for any node is over 5 seconds old, that node is having issues, please check its capture.log
3. If the Disk Q or ES Q for any node is above 50, that node is having issues, please check its capture.log
Disable any bpf= in /opt/arkime/etc/config.ini. If that fixes the issue, read the BPF FAQ answer.
If the browser has "Oh no, Arkime is empty! There is no data to search." but the stats tab shows packets are being captured:
1. Arkime in live capture mode only writes records when a session has ended. It may take several minutes for a session to show up after a fresh start. See /opt/arkime/etc/config.ini to shorten the timeouts.
2. OpenSearch/Elasticsearch will only refresh the indices once a minute with the default Arkime configuration. Force a refresh with the curl http://elasticsearch.hostname:9200/_refresh command.
3. Verify that your time frame for search covers the data (try switching to ALL).
4. Check that you don't have a view set.
5. Check that your user doesn't have a forced expression set. You might need to ask your Arkime admin.
If you are having packet capture issues, restart capture after turning on debugging, either add --debug to the start line or add debug=1 in the [default] section of your config.ini file. You can add multiple --debug options or set debug= to a larger number to get even more information. Capture will print out the configuration settings it is using; verify that they are what you expect.
If you are having issues viewing packets that were captured, restart viewer after turning on debugging, either add --debug to the start line or add debug=1 in the [default] section of your config.ini file. You can add multiple --debug options or set debug= to a larger number to get even more information. Viewer will print out the configuration settings it is using; verify that they are what you expect.
1. Make sure the plugins and parsers directories are correctly set in /opt/arkime/etc/config.ini and readable by the viewer process.
Check the output of the following:
```
grep arkime_packet_log /opt/arkime/logs/capture.log | tail
```
Verify that the packets number is greater than 0. If not, then no packets were processed. Verify that the first pstats number is greater than 0. If not, Arkime didn't know how to decode any packets.

How do I reset Arkime?

Leave OpenSearch/Elasticsearch running.
Shut down all running viewer or capture processes so that no new data is recorded.
To delete all the SPI data stored in OpenSearch/Elasticsearch, use the db.pl script with either the init or wipe commands. The only difference between the two commands is that wipe leaves the added users so that they don't need to be re-added.
```
/opt/arkime/db/db.pl http://ESHOST:9200 wipe
```
Delete the PCAP files. The PCAP files are stored on the file system in raw format. You need to do this on all of the capture machines.
```
/bin/rm -f /opt/arkime/raw/*
```

Self-Signed or Private CA TLS Certificates

The core Arkime team advises against the use of self-signed certificates, despite their technical feasibility. Instead, we encourage leveraging the financial benefits derived from using Arkime over commercial full packet capture solutions to invest in legitimate certificates. The cost of wildcard certificates has decreased significantly, making them an affordable option. Alternatively, free certificates from Let's Encrypt represent a viable option. While members of the Arkime Slack workspace may offer assistance, the core development team typically refers queries back to this guidance. It's important to note that private CA certificates face similar challenges and require analogous solutions as those encountered with self-signed certificates.

One of the simplest methods to bypass the hurdles associated with self-signed certificates involves adding them to your operating system's list of recognized certificates or chains. The process for doing so varies widely across different OS distributions and versions, making a quick internet search the most efficient strategy to find specific instructions. It may be necessary to register the certificate across multiple trust stores due to the varied certificate validation locations utilized by node (for the viewer), curl (for capture), and perl (for db.pl). The Viewer component of Arkime includes a caTrustFile option, a feature introduced by contributors to the project. Since version 4.2.0, all components of Arkime are designed to support the caTrustFile configuration.

An alternative, though less secure option, involves disabling certificate verification entirely. Key components such as Capture, Viewer, arkime_add_user.sh, and db.pl accept the --insecure flag to bypass certificate checks. This flag must be appended to the startup commands for both capture and viewer services. In newer versions of Arkime this can be done by


          cp /opt/arkime/etc/env.example /opt/arkime/etc/capture.env
          echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/capture.env
          cp /opt/arkime/etc/env.example /opt/arkime/etc/viewer.env
          echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/viewer.env
          cp /opt/arkime/etc/env.example /opt/arkime/etc/cont3xt.env
          echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/cont3xt.env

You can edit /opt/arkime/etc/capture.env and /opt/arkime/etc/viewer.env to add other options. We recommend using env files instead of editing the systemd files since they may get overwritten by new installs.

For node.js applications (viewer, wise, cont3xt), you can also set the NODE_EXTRA_CA_CERTS environment variable to the path of your CA certificate files.

How do I upgrade to Moloch 1.x?

Moloch 1.x has some large changes and updates that will require all session data to be reindexed. The reindexing is done in the background AFTER upgrading, so there is little downtime. Large changes in 1.0 include the following:

All the field names have been renamed, and analyzed fields have been removed.
Country codes are being changed from three characters to two characters.
Tags will NOT be migrated if added before 0.14.1.
The data for http.hasheader and email.hasheader will NOT migrate.
IPv6 is fully supported and uses the OpenSearch/Elasticsearch ip type.

If you have any special parsers, taggers, plugins, or WISE sources, you may need to change configurations.

All db fields will need -term removed, or capture won't start and will warn you.

To upgrade:

First make sure you are using Elasticsearch 5.5.x (5.6 recommended) and Moloch 0.20.2 or 0.50.x before continuing. Upgrade to those versions first!
Download 1.1.1 from the downloads page.
Shut down all capture, viewer, and WISE processes.
Install Moloch 1.1.1.
Run /data/moloch/bin/moloch_update_geo.sh on all capture nodes that will download the new mmdb style maxmind files.
Run db.pl http://ESHOST:9200 upgrade once.
Start WISE, then capture, then viewers. Especially watch the capture.log file for any warnings/errors.
Verify that NEW data is being collected and is showing up in viewer. All old data will NOT show up yet.

Once 1.1.1 is working, you need to reindex the old session data:

Disable any db.pl expire or optimize jobs or curator.
Start screen or tmux, because this will take several days.
In the /data/moloch/viewer directory, run /data/moloch/viewer/reindex2.js --slices X.
- The number of slices should be between 2 and the number of shards each index has; the higher slices the faster the conversion, but the more OpenSearch/Elasticsearch CPU that will be used. We recommend half the number of shards.
- You can optionally add an --index option if there are indices you need to reindex first. Otherwise, it will work from newest to oldest.
- You can optionally add --deleteOnDone, which will delete indices as they are converted, but you may want to try a reindex on one index first to make sure it is working.
As reindex runs, old data will show up in viewer.

Delete ALL old indices with the following:

curl -XDELETE 'http://localhost:9200/sessions-*'

Once the reindex finishes, run the db.pl expire/optimize or curator job manually. This will take a while.
Now you can reenable any db.pl expire or optimize jobs or curator. Do NOT reenable crons until you let them run and finish manually.

How do I upgrade to Moloch 2.x?

Upgrading to Moloch 2.x is a multistep process that requires an outage. An outage is required because all the captures must be stopped before upgrading the database so that there are no schema issues or corruption. Most of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with 6.7 or 6.8. This is very important when upgrading to Elasticsearch 7.x or later.

You must be using Moloch 1.7 or 1.8 (Moloch 1.8.0 recommended) BEFORE trying to upgrade to Moloch 2.x.
You must be using Elasticsearch 6.7 or 6.8 (Elasticsearch 6.8.6 or later is recommended) BEFORE trying to upgrade to Moloch 2.x.
Install Moloch >= 2.0 without restarting captures/viewers.
Optional: Run ./db.pl http://ESHOST:9200 backup pre20 to back up all administrative indices.
Shut down captures.
Run ./db.pl http://ESHOST:9200 upgrade.
Restart all capture, multies, and viewers (including both standalone viewers and those running with captures).
Verify that everything is working.

How do I upgrade to Arkime 3.x?

Upgrading to Arkime 3.x is a multistep process that requires an outage. An outage is required because all the captures MUST be stopped before upgrading the database so that there are no schema issues or corruption. Do not restart the capture processes until the db.pl upgrade has finished! All of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with version 7. This is very important when upgrading to Elasticsearch 8.x or later.

Breaking Changes

Elasticsearch before 7.10 is not supported.
All indices will now start with arkime_ after upgrading if a prefix was not previously used.
multies – The multiESNodes requires a name: attribute per entry. Versions 3.0.0–3.3.0 require a prefix: setting. Also, starting with 3.3.1, it defaults to prefix:arkime_.
wise – Custom sources will need to be modified to use the new JavaScript class design.
wise – Redis URLs have a new standard format.
wise – For JSON data, keyColumn has been renamed keyPath.
You may need to set the usersPrefix setting if your users index lives on an Arkime cluster that hasn't been upgraded to use arkime_ yet.
ilm – You will need to run the db.pl ilm command again after upgrading.

Instructions

You must be using Moloch 2.4+ (Moloch 2.7.1 is recommended) BEFORE trying to upgrade to Arkime 3.x.
You must be using Elasticsearch 7.10+ (Elasticsearch 7.10.2 or later is recommended) BEFORE trying to upgrade to Arkime 3.x.
Optional: Run ./db.pl http://ESHOST:9200 backup pre30 to back up all administrative indices.
Install Arkime or Arkime/Moloch Hybrid >= 3.0 without restarting captures/viewers (the hybrid distribution still uses /data/moloch and old binary names).
Shut down captures, multies, and viewers.
Run ./db.pl http://ESHOST:9200 upgrade [other options], and don't forget to include any other options you usually use, like --replicas or --ilm.
Verify that your config.ini and systemd files have the new /opt/arkime path instead of /data/moloch if moving from the Moloch/Hybrid to the Arkime distribution. If you continue to use the Hybrid distribution you do not need to change the paths.
If using ILM, run the db.pl ilm command again with all the same options that were used previously.
Restart all captures, multies, and viewers (including both standalone viewers and those running with captures).
Verify that everything is working.

How do I upgrade to Arkime 4.x?

Upgrading to Arkime 4.x requires that you are already using Arkime 3.3.0 or later. Arkime 4.x uses a new permissions model with roles.

Breaking Changes

systemd files are auto installed, but you still need to enable them manually.
Use roles for permission checking—the userAdmin role is required to edit users.
addUser.js use either the new --roles option or the --admin sets the superAdmin role
In header auth mode, userAuthIps allows only localhost by default.
Encrypted PCAP files now use the .arkime extension.
The WISE multiES prefix now defaults to arkime_.
There are new defaults for the maxFileSizeG=12 and compressES=true settings.
PCAP compression is turned on by default with simpleCompression=gzip; set to "none" to disable or "zstd".
The right-click group name was changed to value-actions in the configuration file.
The userId search on the history page no longer adds the surrounding wildcards automatically. This search box is only available for admin users.

Instructions

Install new Arkime rpm/deb.
Shut down ALL captures, multies, and viewers.
Run ./db.pl http://ESHOST:9200 upgrade [other options], and don't forget to include any other options you usually use, like --replicas or --ilm.
Verify that your config.ini and systemd files have the new /opt/arkime path instead of /data/moloch if moving from the Moloch/Hybrid to the Arkime distribution. If you continue to use the Hybrid distribution, you do not need to change the paths.
Restart all captures, multies, and viewers (including both standalone viewers and those running with captures).
Verify that everything is working.

How do I upgrade to Arkime 5.x?

Upgrading to Arkime 5 requires that you are already using Arkime 4.3.2 or later.

Breaking Changes

s3Compression/simpleCompression now defaults to zstd
s3WriteGzip removed, use s3Compression=gzip for old behavior
s3GapPacketPos defaults to TRUE
enablePacketDedup defaults to TRUE
authMode defaults to digest instead of anonymous
Removed old v1 APIs
The Parliament password has been removed. You must create a parliament.ini file or [parliament] section in your arkime config.ini before upgrading. See Parliament. You can configure common auth via the Parliament settings UI before upgrading or manually in the config file.
WISE/tagger must now use http.request.FIELD/http.response.FIELD when referencing header defined with headers-http-request/headers-http-response
Centos 7 build no longers supports pfring
simpleCompressionBlockSize defaults to 64000
right-click changed to value-actions in config

Instructions

Install new Arkime rpm/deb.
It is recommended but not required to shut down ALL captures, multies, and viewers.
Run /opt/arkime/db/db.pl http://ESHOST:9200 upgrade [other options], replace ESHOST and don't forget to include any other options you usually use, like --replicas or --ilm.
Restart all captures, multies, and viewers (including both standalone viewers and those running with captures).
Verify that everything is working.

How do I upgrade to Arkime 6.x?

Arkime 6 is still in development and not ready for production use.

Upgrading to Arkime 6 requires that you are already using Arkime 5.2.0 or later.

Breaking Changes

Instructions

Install new Arkime rpm/deb.
It is recommended but not required to shut down ALL captures, multies, and viewers.
Run /opt/arkime/db/db.pl http://ESHOST:9200 upgrade [other options], replace ESHOST and don't forget to include any other options you usually use, like --replicas or --ilm.
Restart all captures, multies, and viewers (including both standalone viewers and those running with captures).
Verify that everything is working.

OpenSearch/Elasticsearch

Arkime is designed to work seamlessly with both OpenSearch and Elasticsearch, underscoring our commitment to support both platforms moving forward. While some of the documentation and configurations might exclusively mention Elasticsearch, it's important to note that OpenSearch is compatible with Arkime versions that support Elasticsearch 7 and above. As the two systems continue to evolve separately, there might be new features introduced that are specific to either OpenSearch or Elasticsearch. Rest assured, Arkime is committed to remaining accessible without the necessity for any paid features of Elasticsearch, though we may choose to provide optional support for such features.

How many OpenSearch/Elasticsearch nodes or machines do I need?

The answer, of course, is "it depends." Factors include:

How much memory each box has.
For how many days you want to store metadata (SPI data).
How fast the disks are.
What percentage of the traffic is HTTP/DNS; these session use more OpenSearch/Elasticsearch resources.
The average transfer rate of all the interfaces.
Whether the sessions are long lived or short lived.
How fast response times should be for operators.
How many operators are querying at the same time.

The following are some important things to remember when designing your cluster:

SPI data is usually kept longer then PCAP data. For example, you may store PCAP data for a week but SPI data for a month.
Have at least 1% of disk space used by OpenSearch/Elasticsearch available in OpenSearch/Elasticsearch heap memory. For example, if the cluster has 7 TB of data, then 7*0.01 or 70 GB of OpenSearch/Elasticsearch heap memory is the minimum recommended.
Assign half the memory to OpenSearch/Elasticsearch (but no more then 30 G per node; read https://www.elastic.co/blog/a-heap-of-trouble) and half the memory to disk cache.
Use at least version 7 of Elasticsearch or version 2.3 of OpenSearch.
A quick disk requirement estimate is 5% of the PCAP storage, if storing for the same amount of time.
If you have large machines, they you can run multiple nodes per MACHINE, although this complicates deployments.

We have some estimators that may help.

The good news is that it is easy to add new nodes in the future, so feel free to start with fewer nodes. As a temporary fix for capacity problems, you can reduce the number of days of metadata that are stored. You can use the Arkime ES Indices tab to delete the oldest sessions2 or sessions3 index.

Data never gets deleted

Arkime does NOT support having pcapDir and the OpenSearch/Elasticsearch data directory on the same file system. Arkime will NOT work in this configuration as the tools fight for space.

The SPI data in OpenSearch/Elasticsearch and the PCAP data are not deleted at the same time. The PCAP data is deleted as the disk fills up on the capture machines. See here for more information. PCAP deletion happens automatically, and nothing needs to be done. The SPI data is either deleted by using ILM/ISM or when the ./db.pl expire command is run, usually from cron during off peak. You MUST setup either ILM/ISM or a cron job to delete the SPI data, nothing happens automatically. A cron setup that only keeps 90 days of data and expires at midnight might look like this:

 0 0 * * * /opt/arkime/db/db.pl http://localhost:9200 expire daily 90

So deleting a PCAP file will NOT delete the SPI data, and deleting the SPI data will not delete the PCAP data from disk.

The UI does have commands to delete and scrub individual sessions, but the user must have the Remove Data ability on the users tab. This feature is used for things you don't want operators to see, such as bad images, and not as a general solution for freeing disk space.

ERROR - Dropping request

This error means that your OpenSearch/Elasticsearch cluster can not keep up with the number of sessions that the capture nodes are trying to send it or there are too many messages being sent. You may only see the error message on your busiest capture nodes because capture tries to buffer the requests.

Check the following:

If OpenSearch/Elasticsearch is running on the same machine as capture, that is almost certainly the issue. While that is fine for a proof of concept, you will continue to run into problems.
The ES Nodes tab of the Stats section has the ability to turn on Write Task completed and rejected columns. Look for OpenSearch/Elasticsearch nodes having issues. Make sure those nodes don't have disk issues.
Make sure each OpenSearch/Elasticsearch node has 30 G of memory and 30 G of disk cache (at least) available to it. So for example, if you are on a 64 G machine, only run 1 OpenSearch/Elasticsearch node on the machine.
Try increasing the dbBulkSize to a larger value. Start with 4000000 (4MB), but we don't recommend larger then 20MB.
Try decreasing the packetThreads to a smaller value. Many folks have set packetThreads too large, which causes extra messages to OpenSearch/Elasticsearch. We recommend starting with packetThreads = 2 x Gbps.
OpenSearch/Elasticsearch does NOT perform well if there is one node that is sick. Check all the node hardware, disks, RAID, etc. Make sure that on the ES Nodes tab, there isn't a single node with a high OS Load and low Write/s, which might indicate an issue.
Make sure swap is turned off or swappiness is 0 on OpenSearch/Elasticsearch machines.
If you are running multiple OpenSearch/Elasticsearch nodes, make sure the disks can support the IOPS load. It is usually best to have each OpenSearch/Elasticsearch node use its own disk.
Make sure you are running the latest OpenSearch/Elasticsearch version that the version of Arkime supports; for example, 7.10.2+ if using Elasticsearch 7 or 2.3+ if using OpenSearch.
If using replication on the sessions index, turn off replication of the current day and only replicate previous days. This can be done by using --replicas 1 with your daily ./db.pl expire run after turning off replication in the sessions template using ./db.pl upgrade without the --replicas option.
Make sure there is at most one shard of each session per node. If there are more, run ./db.pl upgrade again.

If these don't help, you need to add more nodes or reduce the number of sessions being monitored. You can reduce the number of sessions with packet-drop-ips, bpf filters, or rules files, for example.

When do I add additional nodes? Why are queries slow?

If queries are too slow, the easiest fix is to add additional OpenSearch/Elasticsearch nodes. OpenSearch/Elasticsearch doesn't perform well if Java hits an OutOfMemory condition. If you ever have one, you should immediately delete the oldest *sessions* index, update the daily expire cron to delete more often, and restart the OpenSearch/Elasticsearch cluster. Then you should order more machines. :)

Removing Nodes

If you need to remove nodes from your OpenSearch/Elasticsearch cluster, follow these steps as an Admin:

Access the Arkime stats page and navigate to the ES Shards tab.
Identify and select the nodes you wish to remove, then choose to exclude them.
Allow some time for the shards to be relocated.
If no shards are relocated, you might need to adjust the OpenSearch/Elasticsearch settings to permit the allocation of at least two shards per node. This is particularly crucial if you're in the process of removing several nodes from the cluster. A higher number of shards per node may be necessary to facilitate this process effectively.
```
curl -XPUT 'localhost:9200/sessions*/_settings' -d '{
  "index.routing.allocation.total_shards_per_node": 2
}'
```
If a large number of shards require redistribution, the default settings might result in the process taking several days, which can be beneficial for maintaining cluster stability, but annoying. To expedite the process, consider increasing the transfer rate from the default setting of 3 streams at 20 MB (totaling 60 MB/sec) to a higher throughput, such as 6 streams at 50 MB (300 MB/sec). Make adjustments based on the disk and network capabilities of the new nodes.
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{
  "indices.recovery.concurrent_streams":6,
  "indices.recovery.max_bytes_per_sec":"50mb"}
}'
```

How to enable OpenSearch/Elasticsearch replication

Turning on replication will consume twice the disk space on the nodes and increase the network bandwidth between nodes.

To change future days, run the following command:

db/db.pl <http://ESHOST:9200> upgrade --replicas 1

To change past days but not the current day, run the following command:

db/db.pl <http://ESHOST:9200> expire <type> <num> --replicas 1

We recommend the second solution because it allows current traffic to be written to OpenSearch/Elasticsearch once, and during off peak the previous day's traffic will be replicated.

How do I upgrade OpenSearch/Elasticsearch?

In general, if upgrading between minor or build versions of Elasticsearch, you can perform a rolling upgrade with no issues. Follow Elastic's instructions for the best results. Make sure you select the matching version of that document for your version of Elasticsearch from the dropdown menu on the right side of the screen.

Upgrading between major versions of Elasticsearch usually requires an upgrade of Arkime. See the following instructions:

Elasticsearch 9
Elasticsearch 8
Elasticsearch 7
Elasticsearch 6
Elasticsearch 5

How do I upgrade to Elasticsearch 9.x?

Elasticsearch 9.x is NOT supported before Arkime 5, and we recommend that you use Arkime 5.7.1 or later.

Elasticsearch 9 can only perform upgrades from Elasticsearch 8.18 or later, so you will need to upgrade Elasticsearch to 8.18 or later.
You may want to read the Elasticsearch 9.0 breaking changes.
Read Elastic's suggested upgrade instructions.

How do I upgrade to Elasticsearch 8.x?

Elasticsearch 8.x is NOT supported before Arkime 3.4.1, and we recommend that you use Arkime 4.x.

You must first upgrade to Arkime 3.4.1 or higher and run db.pl http://ESHOST:9200 upgrade while still using Elasticsearch 7.
Elasticsearch 8 can only perform upgrades from Elasticsearch 7.17 or later, so you will need to upgrade Elasticsearch to 7.17 or later.
Make sure your Elasticsearch configuration files are ready for Elasticsearch 8. We do not provide sample Elasticsearch configurations, but here are some things to look out for:
- By default, Elasticsearch 8 enables HTTPS and passwords, make sure you update your Arkime configuration file to use them.
- There are several configuration variable changes. We suggest trying your elasticsearch.yml configuration file with a test cluster.
- You may want to read the Elasticsearch 8.0 breaking changes.
Follow Elastic's rolling upgrade instructions.

How do I upgrade to Elasticsearch 7.x?

Elasticsearch 7.x is supported by Moloch 2.x only if there are no Elasticsearch 5.x–created indices remaining. We recommend you upgrade to Elasticsearch 7.8.x or later.

Upgrading to Elasticsearch 7 MAY REQUIRE downtime.

If you are NOT using Arkime DB version 63 (or later), you must follow these instructions while still using Elasticsearch 6.x and upgrade to Moloch 2.x. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Arkime.
Make sure your Elasticsearch configuration files are ready for Elasticsearch 7. We do not provide sample Elasticsearch configurations, but here are some things to look out for:
- You MUST add a cluster.initial_master_nodes, and there may be other required discovery changes.
- If you have more then 1,000 shards per node, you must set cluster.max_shards_per_node.
- If you previously set indices.breaker.total.limit, you should unset it.
- You may want to read the Elasticsearch 7.0 breaking changes.
Now you need to upgrade from Elasticsearch 6 to Elasticsearch 7. There are two options:
1. Upgrading to Elasticsearch 7 if using Elasticsearch 6.8.6 (or later) can be done with a rolling upgrade. Follow Elastic's instructions for the best results. You do NOT need to stop capture/viewer, but after the rolling upgrade is finished, you may want to restart capture everywhere.
2. If you are not using Elasticsearch 6.8.6, or if you would prefer to perform a full restart, follow the instructions below:
  1. Make sure you delete any old indices that db.pl notified you about when you installed Moloch 2.x.
  2. Shut down everything: Elasticsearch, capture, and viewer.
  3. Upgrade Elasticsearch to 7.x (7.8.0 or later is recommended).
  4. Start the Elasticsearch cluster.
  5. Wait for the cluster to go GREEN. This will take LONGER than usual as Elasticsearch upgrades indices from the 6.x to the 7.x format.
```
curl http://localhost:9200/_cat/health
```
  6. Start viewers and captures.

How do I upgrade to Elasticsearch 6.x?

Elasticsearch 6.x is supported by Moloch 1.x for NEW clusters and >= 1.5 for UPGRADING clusters.

NOTE – If upgrading, you must FIRST upgrade to Moloch 1.0 or 1.1 (1.1.1 is recommended) before upgrading to > 1.5. Also, all reindex operations need to be finished.

We do NOT provide Elasticsearch 6 startup scripts or configuration, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to Elasticsearch 6 will REQUIRE two downtimes.

First outage: If you are NOT using Moloch DB version 51 (or later), you must follow these steps while still using Elasticsearch 5.x. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Moloch.

Install Moloch >= 1.5.
Shut down capture.
Run ./db.pl http://ESHOST:9200 upgrade .
Restart capture.
Verify that everything is working.
Make sure you delete the old indices that db.pl notified you about.

Second outage: Upgrade to Elasticsearch 6.

Make sure you delete the old indices that db.pl notified you about.
Shut down everything.
Upgrade Elasticsearch to 6.x.
WARNING – path.data will have to be updated to access your old data. If you had path.data: /data/foo, you will probably need to change to /data/foo/<clustername>.
Start the Elasticsearch cluster.
Wait for the cluster to go GREEN. This will take LONGER than usual as Elasticsearch upgrades indices from the 5.x to the 6.x format.
```
curl http://localhost:9200/_cat/health
```
Start viewers and captures.

How do I upgrade to Elasticsearch 5.x?

Elasticsearch 5.x is supported by Moloch 0.17.1 for NEW clusters and 0.18.1 for UPGRADING clusters.

Elasticsearch 5.0.x, 5.1.x, and 5.3.0 are NOT supported because of Elasticsearch bugs/issues. We currently use 5.6.7.

WARNING – If you have sessions-* indices created with Elasticsearch 1.x, you can NOT upgrade. Those indices will need to be deleted.

We do NOT provide Elasticsearch 5 startup scripts, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to Elasticsearch 5 may REQUIRE 2 downtime periods of about 5–15 minutes each.

First outage: If you are NOT using Moloch DB version 34 (or later), you must follow these steps while still using Elasticsearch 2.4. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Moloch.

Upgrade to Elasticsearch 2.4.x.
Check for a GREEN Elasticsearch cluster:
```
curl http://localhost:9200/_cat/health
```
Install Moloch 0.18.1 to 0.20.2.
Shut down all capture nodes.
Run ./db.pl http://ESHOST:9200 upgrade .
Start up captures and make sure everything works.
You can remain on Elasticsearch 2.4.x until you want to try Elasticsearch 5.

Second outage: Upgrade to Elasticsearch 5.

You MUST be using Elasticsearch 2.4.x and Moloch DB version 34 (or later) before using Elasticsearch 5 (see above).
Shut down EVERYTHING (Elasticsearch, viewer, capture).
Upgrade Elasticsearch to 5.6.x.
Start the Elasticsearch cluster.
Wait for the cluster to go GREEN. This will take LONGER than usual as Elasticsearch upgrades indices from the 2.x to the 5.x format.
```
curl http://localhost:9200/_cat/health
```
Start viewers and captures.

version conflict, current version [N] is higher or equal to the one provided [M]

This error usually happens when the capture process is trying to update the stats data and falls behind. Arkime will continue to function while this error occurs with the stats or dstats index; however, it does usually mean that your Elasticsearch cluster is overloaded. You should consider increasing your Elasticsearch capacity by adding more nodes, CPU, and/or more memory. If increasing Elasticsearch capacity isn't an option, then reduce the amount of traffic that Arkime processes.
If the N vs. M version numbers are very different from each other, it usually means that you are running two nodes with the same node name at the same time, which is not supported.

Recommended OpenSearch/Elasticsearch Settings

Here are some of our recommended OpenSearch/Elasticsearch settings. Many of these can be updated on the fly, but it is still best to put them in your elasticsearch.yml file. We strongly recommend using the same elasticsearch.yml file on all hosts. Things that need to be different per host can be set with variables.

Disk Watermark

You will probably want to change the watermark settings so you can use more of your disk space. You have the option to use ALL percentages or ALL values, but you can't mix them. The most common sign of a problem with these settings is an error that has FORBIDDEN/12/index read-only / allow delete in it. You can use ./db.pl http://ESHOST:9200 unflood-stage _all to clear the error once you adjust the settings and/or delete some data. Elasticsearch Docs

cluster.routing.allocation.disk.watermark.low: 97%
cluster.routing.allocation.disk.watermark.high: 98%
cluster.routing.allocation.disk.watermark.flood_stage: 99%

Or, if you want more control, use values instead of percentages:

cluster.routing.allocation.disk.watermark.low: 300gb
cluster.routing.allocation.disk.watermark.high: 200gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb

Shard Limit

If you have a lot of shards that you want to be able to search against at once Elasticsearch Docs

action.search.shard_count.limit: 100000

Write Queue Limit

No longer need to set since Elasticsearch 7.9. If you hit a lot of bulk failures, this can help, but Elastic doesn't recommend raising too much. In older versions of Elasticsearch, it is named thread_pool.bulk.queue_size, so check the docs for your version. Elasticsearch Docs

thread_pool.write.queue_size: 10000

HTTP Compression

On by default in most versions, allows for HTTP compression. Elasticsearch Docs

http.compression: true

Recovery Time

To speed up recovery times and startup times, there are a few controls to experiment with. Make sure you test them in your environment and slowly increase them because they can break things badly. Elasticsearch Allocation Docs and Elasticsearch Recovery Docs

cluster.routing.allocation.cluster_concurrent_rebalance: 10
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.node_initial_primaries_recoveries: 5
indices.recovery.max_bytes_per_sec: "400mb"

Logging

By default, Elasticsearch has logging set to debug level in log4j2.properties. For busy clusters, change this to info level to lower CPU and disk usage.


logger.action.level = info

Using ILM/ISM with Arkime

You can easily use ILM (Elasticsearch) or ISM (OpenSearch) to move indices from hot to warm, force merge, and delete.

We recommend only using ILM with newer versions (7.2+) of Elasticsearch because older versions had some issues. Some versions of Elasticsearch don't include ILM, so you may need to upgrade or subscribe to a paid license.

Once ILM/ISM is enabled, you no longer have to use the db.pl expire cron job but should occasionally run db.pl optimize-admin.

Arkime does NOT currently support the ILM/ISM auto rollover feature, for performance reasons when searching.

These instructions assume you are using db.pl or Arkime UI to set up ILM/ISM and will use a special molochtype attribute name. You can also do this with Kibana to create the ILM/ISM config and not use the molochtype attribute name, but you will then need to do everything on your own. In order for ILM/ISM work correctly with Arkime, follow these five important steps:

If you are using a hot/warm design or might in the future, for each Elasticsearch/OpenSearch node, add a line to your elasticsearch.yml file with node.attr.molochtype: warm or node.attr.molochtype: hot
Create the molochsessions and molochhistory ILM/ISM polices. This can be done easily with Kibana, or we recommend the db.pl ilm or db.pl ism command.
Assign the molochsessions and molochhistory ILM/ISM polices to all the existing indices. Kibana or db.pl ilm or db.pl ism can perform this action.
Change the moloch templates to use the ILM/ISM polices for NEW indices. You'll need to rerun db.pl upgrade ... and add --ilm or --ism to the command. Also add --hotwarm if using a hot/warm design.
Replace the previous db.pl expire cron job with db.pl optimize-admin

Elasticsearch example: to create a new policy that keeps 30 weeks of history, 90 days of SPI data, 1 replica, and optimizes all indices older than 25 hours, you would run: ./db.pl http://localhost:9200 ilm 25h 90d --history 30 --replicas 1 You would then need to run upgrade with all the arguments you usually use, plus --ilm: ./db.pl http://localhost:9200 upgrade --replicas 1 --shards 5 --ilm

OpenSearch example: to create a new policy that keeps 30 weeks of history, 90 days of SPI data, 1 replica, and optimizes all indices older than 25 hours, you would run: ./db.pl http://localhost:9200 ism 25h 90d --history 30 --replicas 1 You would then need to run upgrade with all the arguments you usually use, plus --ism: ./db.pl http://localhost:9200 upgrade --replicas 1 --shards 5 --ism

Capture

What kind of capture machines should I buy?

Arkime prioritizes cost-effective solutions by utilizing standard hardware components. Consider carefully before opting for upgrades like SSDs or high-end network cards. In some cases, adding another server might be a more economical choice. This approach not only reduces individual machine costs but also enhances overall data retention capacity.

Choosing the right machine for the job requires careful consideration. Here are some key factors to keep in mind:

Network traffic storage requirements can be substantial. On average, uncompressed traffic of 1 Gbps can consume around 11 TB of disk space daily. This translates to needing hundreds of terabytes to store just a week's worth of data at moderate traffic levels (2.5 Gbps for 7 days equates to roughly 192.5 TB). Fortunately, compression and excluding full TLS sessions can significantly reduce storage needs.
When calculating bandwidth requirements, consider both incoming (RX) and outgoing (TX) traffic. While a 10 Gbps link is advertised as such, it can handle the combined traffic of up to 20 Gbps, as it allows 10 Gbps in each direction. Remember to include both directions in your calculations to avoid underestimating your needs.
Avoid overloading network links from your NPB. For example, when monitoring a combined traffic of 8 Gbps (4 Gbps RX and 4 Gbps TX), using two separate 10 Gbps from the NPB to capture is recommended. This is because the combined traffic (8 Gbps) is approaching the maximum capacity (10 Gbps) of a single link, potentially leading to dropped packets during bursts.
Arkime requires all packets belonging to the same "5-tuple" (source IP, destination IP, source port, destination port, and protocol) to be processed by the same capture process. Ensure your network visibility architecture can accommodate this requirement.

When selecting Arkime capture boxes, standard "Big Data" boxes might be the best bet ($10k–$25k each). Look for:

CASE: There are many 4RU boxes out there. If space is an issue, there are more expensive 2RU boxes that hold over 16 drives (examples: HPE Apollo 4200 , Supermicro 6028R-E1CR24L, or Dell R740XD2)
MEMORY: 64 GB to 96 GB (or more if running other tools)
OS DISKS: We like RAID 1 small drives. SSDs are nice but not required.
CAPTURE DISKS: 16+ x 16 TB or larger SATA drives. Don't waste money on enterprise/SAS/15k drives.
RAID: A hardware RAID card with at least 1 G cache (2 G is better). We like RAID 5 with 1 hot spare or RAID 6 (with better cards).
NIC: We like newer Mellanox/Intel base NICs. Most newish cards should work fine, however double check the drivers for your Linux distribution.
- 10G NICs are the best value. Scaling Arkime over 10Gbps can be difficult anyway because of disk write performance, unless using flash.
- 25G NICs can be a good choice when connecting to 100G NPB ports. Many NPBs support 4x25G links per 100G port, allowing you quadruple the number of tool hosts you have.
- 40G NICs are not recommended. Instead many NPBs support 4x10G links per 40G NPB port, allowing you to quadruple the number of tool hosts you have if you use 10G ports on the tool hosts.
CPU: At least 2 x 6 cores. The higher the average Gbps, the more speed/cores required.

We are big fans of using network packet brokers (NPBs) ($6k+). They allow multiple taps/mirrors to be aggregated and load balanced across multiple capture machines. Read more in the following sections.

What kind of NPB should I buy?

We are big fans of using NPBs, and we recommend that medium or large Arkime deployments use an NPB. See MolochON 2017 NPB Presentation .

Main advantages:

Easy horizontal scaling of Arkime
Load balancing of traffic
Filtering of traffic before it hits the Arkime boxes
Easier to add more Arkime capacity or other security tools
Don't have to worry as much about new links being added by the network team

Features to look for:

Load balancing
Consistent symmetric hashing (this means each direction of the flow goes out the same tool port)
MPLS/VLAN/VPN stripped (optional—some tools don't like all the headers)
Tool link detection and failover
Automation capability (can you use Ansible/APIs, or are you stuck using a web UI?)
Enough ports to support future tap and tool growth
Whether the features desired require an extra (expensive?) component and/or license

Just like with Arkime, with commodity hardware, you don't necessarily have to pay a lot of money for a good NPB. Some switch vendors offer switches that can operate in switch mode or NPB mode, so you might already have gear you can use.

Sample vendors

Arista – https://www.arista.com/en/solutions/tap-aggregation
Gigamon – https://www.gigamon.com/
Ixia – https://www.ixiacom.com/products/network-packet-brokers

What kind of packet capture speeds can `arkime-capture` handle?

On modern commodity hardware, achieving throughput of 5 Gbps or more is easy, depending largely on the number of CPUs allocated to Arkime and the other tasks the machine is handling. The bottleneck in performance is almost always the speed of storing PCAP to disk! If your disks or RAID can't keep up, you either need to not save as much PCAP using Arkime Rules and other options, select a faster RAID configuration and disks, or give Arkime dedicate disks. For further details, refer to the Architecture and Multiple Host sections. Arkime supports the utilization of multiple threads for both packet acquisition and packet processing.

A simple method to test a local RAID devicee:

dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct

To test a NAS, leave off the oflag=direct and make sure you test with at least 3x the amount of memory so that cache isn't a factor:

dd bs=256k count=150000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test

The output represents the maximum disk performance. If you wish to obtain a more accurate assessment, run several tests and average the results. To avoid packet loss, it's advisable to operate Arkime at no more than 80% of the maximum disk performance. For systems utilizing RAID, aiming for about 60% of this performance metric can further minimize issues, especially during RAID rebuilds. It's important to note that network throughput is typically measured in bits, whereas disk performance is gauged in bytes, requiring the conversion of these measurements for accurate comparison.

Arkime requires full packet captures error

When you have an error about the capture length not matching the packet length, it is NOT an issue with Arkime. The issue is with the network card settings OR how the pcap file was created.

By default modern network cards offload work that the CPUs would need to do. They will defragment packets or reassemble tcp sessions and pass the results to the host. However this is NOT what we want for packet captures, we want what is actually on the network. So you will need to configure the network card to turn off all the features that hide the real packets from Arkime.

The sample config file (/opt/arkime/bin/arkime_config_interfaces.sh) turns off many common features but there are still some possible problems:

If using containers or VMs for Arkime, you may need to turn off the features on the physical interface the VM interface is mapped to from the host OS, instead of inside the container/VM.
If using a fancy card there may be other features that need to be turned off.
1. You can find them usually with ethtool -k INTERFACE | grep on — Anything that is still on, turn off and see if that fixes the problem. Items that says [fixed] can NOT be disabled with ethtool.
2. For example ethtool -K INTERFACE tx off sg off gro off gso off lro off tso off
Sometimes switching to the Arkime AFPacket read method will fix the issue.

Workarounds:

For offline pcap files, make sure they were created without a snaplen, for example with tcpdump use -s 0.
For offline pcap files, you can set readTruncatedPackets=true in the config file, most protocol parsing in Arkime will not work.
(Since 5.0.1) For live pcap capture, you can set readTruncatedPackets=true in the config file.
For both offline and live capture you can increase the max packet length with snapLen=65536 in the config file, this is not recommended.

Why am I dropping packets? (and Disk Q issues)

There are several different types of packet drops and reasons for packet drops:

Arkime Version

Please make sure you are using a recent version of Arkime. Constant improvements are made and it is hard for us to support older versions.

Kernel and TPACKET_V3 support

The most common cause of packet drops with Arkime is leaving the reader default of libpcap instead of switching to tpacketv3, also known as AFPacket. The tpacketv3 reader was greatly improved in Arkime 4.0, so please upgrade. Review the high performance sample config and the tpacketv3 settings to configure the tpacketv3 reader.

Network Card Config

Make sure the network card is configured correctly by increasing the ring buf to max size and turning off most of the card's features. The features are not useful anyway, since we want to capture what is on the network instead of what the local OS sees. Example configuration:

# Set ring buf size, see max with ethool -g eth0
ethtool -G eth0 rx 4096 tx 4096
# Turn off feature, see available features with ethtool -k eth0
ethtool -K eth0 rx off tx off gs off tso off gso off

If Arkime was installed from the deb/rpm and the Configure script was used, this should already be done in /data/moloch/bin/moloch_config_interfaces.sh

MTU Setting

Many network cards will drop packets if the packet MTU is larger than the interface MTU. It is common when sniffing network taffic to have encapsulated packets that are larger than the default MTU of 1500. Make sure the MTU of the capture interface is set to at least 1600 or 9216 if you are using jumbo frames.

packetThreads and the PacketQ is overflowing error

The packetThreads config option controls the number of threads processing the packets, not the number of threads reading the packets off the network card. You only need to change the value if you are getting the Packet Q is overflowing error. The packetThreads option is limited to 24 threads, but usually you only need a few. Configuring too many packetThreads is actually worse for performance, please start with a lower number and slowly increase. You can also change the size of the packet queue by increasing the maxPacketsInQueue setting.

To increase the number of threads the reader uses please see the documentation for the reader you are using on the settings page.

Disk and Disk Q issues

In general errors about the Disk Q being exceeded are NOT a problem with Arkime, but usually an issue with either the hardware or the packet rate exceeding what the hardware can save to disk. You will usually need to either fix/upgrade the hardware or reduce the amount of traffic being saved to disk.

Make sure OpenSearch/Elasticsearch isn't using the same disks.
Make sure swap has been disabled, swappiness is 0, or at the very least, isn't writing to the disk being used for PCAP.
Make sure the RAID isn't in the middle of a rebuild or something worse. Most RAID cards will have a status of OPTIMAL when things are all good and DEGRADED or SUBOPTIMAL when things are bad.
To test the RAID device use:
```
dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
```
This is the MAX disk performance. Run several times if desired and take the average. If you don't want to drop any packets, you shouldn't average more then ~80% of the MAX disk performance. If using RAID and don't want drop packets during a future rebuild, ~60% is a better value. Remember that most network numbers will be in bits while the disk performance will be in bytes, so you'll need to adjust the values before comparing.
Make sure you actually have enough disk write thru capacity and disks. For example, for a 1G link with RAID 5 you may need:
- At least 4 spindles if using a RAID 5 card with write cache enabled.
- At least 8 spindles (or more) if using a RAID 5 card with write cache disabled.
Make sure your RAID card can actually handle the write rate. Many onboard RAID 5 controllers can not handle sustained 1G write rates.
Switch to RAID 0 from RAID 5 if you can live with the TOTAL data loss on a single disk failure.
If you are using xfs make sure you use mount options defaults,inode64,noatime
Don't run capture and OpenSearch/Elasticsearch on the same machine.

If using EMC for disks:

Make sure write cache is enabled for the LUNs.
If it is a CX with SATA drives, RAID-3 is optimized for large sequential I/O.
Monitor EMC lun queue depth, may be too many hosts sharing it.

To check your disk IO run iostat -xm 5 and look at the following:

wMB/s will give you the current write rate, does it match up with what you expect?
avgqu-sz should be near or less then 1, otherwise linux is queueing instead of doing
await should be near or less then 10, otherwise the IO system is slow, which will slow capture down.

Other things to do/check:

If using RAID 5 make sure you have write cache enabled on the RAID card. Sometimes this is called WriteBack. Make sure the BBU is still good or you have write cache enabled even when the BBU isn't working or missing.

Adaptec Example: arcconf SETCACHE 1 LOGICALDRIVE 1 WBB
HP Example: hpssacli ctrl slot=0 modify dwc=enable
MegaCLI: MegaCli64 -LDSetProp -ForcedWB -Immediate -Lall -aAll ; MegaCli64 -LDSetProp Cached -L0 -a0 -NoLog

Other

There are conflicting reports that disabling irqbalancer may help.
Check that the CPU you are giving capture isn't handling lots of interrupts (cat /proc/interrupts).
Make sure other processes aren't using the same CPU as capture.

WISE

Cyclical packet drops may be caused by bad connectivity to the wise server. Verify that the WISE responds quickly
```
curl http://arkime-wise.hostname:8081/views
```
on the capture host that is dropping packets.

High Performance Settings

See settings

How do I import existing PCAPs?

Think of the capture binary much like you would tcpdump. The capture binary can listen to live network interface(s), or read from historic packet capture files. Currently Arkime works best with PCAP files, not PCAPng.

/opt/arkime/bin/capture -c [config_file] -r [PCAP file]

For an entire directory, use -R [PCAP directory]

See /opt/arkime/bin/capture --help for more info. The --monitor to monitor non NFS directories, --skip to skip already loaded PCAP files, and -R to process directories are common options. Multiple -r and -R options can be used.

If Arkime is failing to load a PCAP file check the following things:

Use PCAP formatted files and not PCAPng
Make sure the PCAP files contain IP traffic, Arkime currently ignores ARP and other traffic.
Try running capture with --debug which might warn of not understanding the link type or GRE tunnel type. (Please open issues for unknown link or GRE types)

By default importing offline pcap does NOT make a copy of the pcap file, Arkime saves a reference to the original file, which shows up as locked on the files tab. If you want to make a copy of the pcap file, use the --copy option with capture.

Enable Arkime UI to upload

It is also possible to enable UI in Arkime to upload PCAP. This is less efficient then just using capture directly, since it uploads the file and then runs capture for you. Just uncomment the uploadCommand in the config.ini file.

How do I monitor multiple interfaces?

The easy way is using the interface setting in your config.ini. It supports a semicolon ';' separated list of interfaces to listen on for live traffic. If you want to set a tag or another field per interface, use the interfaceOps setting.

The hard way, you can also have multiple capture processes,.

Arkime by default uses the unqualified hostname as the name of the Arkime node, so you'll need to come up with a naming scheme. Appending a, b, c, … or the interface number to the hostname are possible methods.
Edit /opt/arkime/etc/config.ini, and create a section for each of the Arkime nodes. Assuming the defaults are correct in the [default] section, the only thing that MUST be set is the interface item. It is also common to have each Arkime node talk to a different OpenSearch/Elasticsearch node if running a cluster of OpenSearch/Elasticsearch nodes. The arkime-m01 is an EXAMPLE node name.
```
[arkime-m01a]
interface=eth2
[arkime-m01b]
interface=eth5
```
If hostname + domainname on the machine doesn't return a FQDN, you'll also need to set a viewUrl, or easier use the --host option.

You'll need two systemd scripts and modify them to use the two different node names. Something like.

mv /etc/systemd/system/arkimecapture.service arkimecapture1.service
cp /etc/systemd/system/arkimecapture1.service /etc/systemd/system/arkimecapture2.service

Now you'll need to edit those two files and add the -n options to the ExecStart lines after the capture. So something like ExecStart=/bin/sh -c '/opt/arkime/bin/capture -n arkime-m01a -c /opt/arkime/etc/config.ini and ExecStart=/bin/sh -c '/opt/arkime/bin/capture -n arkime-m01b -c /opt/arkime/etc/config.ini
Now you can use systemd to start them. systemctl daemon-reload; systemctl start arkimecapture1; systemctl start arkimecapture2

You only need to run one viewer on the machine. Unless it is started with the -n option, it will still use the hostname as the node name, so any special settings need to be set there (although default is usually good enough).

Arkime capture crashes

Please file an issue on github with the stack trace.

You'll need to allow suid or user changing programs to save core dumps. Use sysctl to change until the next reboot. Setting it to 0 will change it back to the default.
```
sysctl -w fs.suid_dumpable=2
```
The user that Arkime switches to must be able to write to the directory that capture is running in.
Run capture and get it to crash.
Look for the most recent core file.
Run gdb (you may need to install the gdb package first)
```
gdb /opt/arkime/bin/capture corefilename
```
Get the back trace using the bt command

If it is easy to reproduce, sometimes it's easier to just run gdb as root:

Run gdb capture as root.
Start Arkime in gdb with run ALL_THE_ARGS_USED_FOR_ARKIME-CAPTURE_GO_HERE.
Wait for crash.
Get the backtrace using bt command.
Sometimes, you need to put a break point in g_log b g_log

ERROR - pcap open failed

Usually capture is started as root so that it can open the interfaces and then it immediately drops privileges to dropUser and dropGroup, which are by default nobody:daemon. This means that all parent directories need to be either owned or at least executable by nobody:daemon and that the pcapDir itself must be writeable.

How to reduce amount of traffic/pcap?

Listed in order from highest to lowest benefit to Arkime

Setting the bpf= filter will stop Arkime from seeing the traffic.
Adding CIDRs to the packet-drop-ips section will stop Arkime from adding packets to the PacketQ
Using Rules it is possible to control if the packets are written to disk or the SPI data is sent to OpenSearch/Elasticsearch

Life of a packet

Arkime capture supports many options for controlling which packets are captured, processed, and saved to disk.

The first gatekeeper and most important is the bpf filter, bpf= in the config file. This filter can be implemented in the kernel, the network card, libpcap or network drivers. It is a single filter and it controls what Arkime capture "sees" or doesn't "see". Any packet that is dropped because of the bpf filter is usually not counted in ANY Arkime stats, but some implementation do expose stats.
Arkime does a high level decode of the ethernet, IP, IP protocol information and sees if it understands it. If it doesn't supports it, Arkime will discard the packet.
Arkime checks the packet-drop-ips config section to see if the IPs involved are marked to be discarded. If there are only a few IPs to drop then bpf= should be used, otherwise this is much more efficient then a huge bpf.
For TCP packets, Arkime checks against previous matched against rules that set a _dropByDst or _dropBySrc timeout, if it matches they will be discarded.
Arkime picks a packet queue to send the packet to, if the packet queue is too busy it will drop the packet. Potentially increase packetThreads or maxPacketsInQueue if too many packets are being dropped here.
A packet queue will start processing a packet and update all the stats and basic information for the session the packet is associated with.
The sessionSetup rules for first packets in a session are executed, which might set operations that control packet saving.
If this is the first packet of the session the packet queue will then check all the dontSaveBPFs, and if one matches it will save off the max number of packets to save for the session. This will override the maxPackets config setting.
If this is the first packet of the session AND no dontSaveBPFs matched, the packet queue will then check all the minPacketsSaveBPFs and save off a min number of packets that must be received.
Finally Arkime goes to save the packet, if it has already saved the max number of packets for the session (set by rules or dontSaveBPFs) OR if there was another method (plugin) that said stop saving packets for the session the packet won't be saved.
If the number of packets for the session is greater then maxPackets the session will be saved, a new linked session will be started for future packets. The beforeMiddleSave and beforeBothSave rules will be executed before saving.
The packet queue sends the packet off to the various classifiers and parsers to gather more meta data. The afterClassify rules will be executed, and if any fields are set during this processing the fieldSet rules will be executed. Rules may change if future packets are saved.
At some point in the future the session will hit one of the timeouts and the session will be saved if there have been enough packets saved to meet the min number of packets received setting per session. (Defaults to 0) The beforeFinalSave and beforeBothSave rules will be executed.

PCAP Deletion

Arkime does NOT support having pcapDir and the OpenSearch/Elasticsearch data directory on the same file system. Arkime will NOT work in this configuration as the tools fight for space.

PCAP deletion is handled by the viewer process, so it is important that viewer is running on all capture instances. The viewer process checks on startup and then every minute to see how much space is available. When free space is below freeSpaceG, then viewer will start deleting the oldest PCAP files. The viewer process logs every time a file is deleted, so it is possible to figure out when a file is deleted if needed. If viewer complains about not finding the PCAP file, make sure you check the viewer.log for errors.

Note: freeSpaceG can be a number freeSpaceG=1000 or a percentage, with freeSpaceG=5% the default. The viewer process will always leave at least 10 PCAP files on the disk, so make sure there is room for at least maxFileSizeG * 10 capture files on disk, or by default 120G.

If still having PCAP delete issues:

Verify freeSpaceG is set correctly for the node in the config file
Restart viewer after turning on debugging by adding debug=2 in the [default] section of your config.ini file. After restarting viewer, check the viewer.log for messages or use grep -i expire /opt/arkime/logs/viewer.log to see the relevant messages
Make sure there is free space where viewer is writing its logs if you don't see any messages in viewer.log
Verify that dropUser or dropGroup can actually delete files in the PCAP directory and has read/execute permissions in all parent directories. So for example you need to check the /opt and the /opt/arkime and the /opt/arkime/raw directory permissions. The PCAP files will have read/write permissions which is normal.
Make sure the PCAP directory is on a filesystem with at least maxFileSizeG * 10 space available.
If there is a mismatch between the files in the directory and the files on the Files tab run the db.pl http://localhost:9200 sync-files command
Make sure the files in the file tab don't have locked set, viewer won't deleted locked files
Try restarting viewer
If using seilnux (sestatus) temporarily disable it (setenforce 0) and see if that fixes the problem.

dontSaveBPFs doesn't work

There are several common reasons dontSaveBPFs might not work for you.

Look at the saved PCAP, not the packet count in the UI, Arkime will still count the number of packets, it just won't save them
Make sure you've spelled it dontSaveBPFs, case matters
Make sure you've placed dontSaveBPFs in the correct section, you can verify by adding a --debug to capture when starting and looking at the output
Turns out BPF filters are tricky. :) When the network is using vlans, then at compile time, BPFs need to know that fact. So instead of a nice simple dontSaveBPFs=tcp port 443:10 use something like dontSaveBPFs=tcp port 443 or (vlan and tcp port 443):10. Basically FILTER or (vlan and FILTER). Information from here.
Try testing your filter manually with tcpdump, you should only see the traffic you want to drop. So something like tcpdump -i INTERFACE tcp port 443 for example.

If still having issues, you might just try out a Arkime Rules file. Arkime converts dontSaveBPFs into a rule for you behind the scenes, so Arkime Rules are actually more powerful.

Zero or missing bytes PCAP files

Arkime optimizes disk writes for efficiency, making it highly suitable for high bandwidth networks. However, this approach might not be ideal for low bandwidth environments. The amount of data Arkime buffers before writing to disk is determined by the pcapWriteSize setting, which has a default value of 262144 bytes. It's crucial to remember that this buffering occurs on a per-thread basis. Therefore, for low bandwidth networks, it's advisable to set packetThreads to 1 (a single thread). The system is designed to flush the buffered PCAP to disk after 10 seconds of inactivity, but direct-io requires pagesize bytes to still be buffered, typically around 4096 bytes.

Encountering an error message such as ERROR - processSessionIdDisk - SESSIONID in file FILENAME couldn't read packet at FILEPOS packet # 0 of 2 or Not enough data 0 for header 16 usually indicates that the PCAP data is either still in the process of buffering and requires more time to be fully written to disk, or it suggests that a capture process or system crash occurred before the PCAP data could be saved. It may be useful to turn OFF compression simpleCompression=none on low bandwidth networks since compression causes more data to be buffered.

Note, running out of disk space can lead to the creation of numerous zero-byte PCAP files. For details on managing disk space and preventing such issues, refer to PCAP Deletion.

Can I virtualize Arkime with KVM using OpenVswitch?

In small environments with low amounts of traffic this is possible. With Openvswitch you can create mirror port from a physical or virtual adapter and send the data to another virtual NIC as the listening interface. In KVM, one issue is that it isn't possible to increase the buffer size past 256 on the adapter using the Virtio network adapter (mentioned in another part of the FAQ). Without Arkime capture will continuously crash. To solve this in KVM, use the E1000 adapter, and configure the buffer size accordingly. Set up the SPAN port on Openvswitch to send traffic to it: https://www.rivy.org/2013/03/configure-a-mirror-port-on-open-vswitch/.

Installing MaxMind Geo free database files

MaxMind recently changed how you download their free database files. You now need to signup for an account and setup the geoipupdate program. If using a version of Moloch before 2.2, you will need to edit your config.ini file and update the geolite paths.

Instructions:

Sign up for a MaxMind account (no purchase required)
Wait for MaxMind email and set your password
Install the geoipupdate tool, pay attention to version installed, for many distributions you can just do a yum install geoipupdate or apt install geoipupdate
Create a license key
Select Yes when asked "Will this key be used for GeoIP Update?" and select the version you have
Use the MaxMind feature to generate a config file for you, usually you will replace /etc/GeoIP.conf with this file
Run geoipupdate as root and see if it works
If you are using Moloch before 2.2, update your /data/moloch/etc/config.ini file so that geoLite2Country is now /usr/share/GeoIP/GeoLite2-Country.mmdb and geoLite2ASN is now /usr/share/GeoIP/GeoLite2-ASN.mmdb
Restart capture

What do these log lines mean?

Arkime logs a lot of information for debugging purposes. Much of this information is for bug reports, but can also be used to figure out what is going on. You may need to use --debug to enable these msgs.

HTTP Responses

Jan 01 01:01:01 http.c:369 moloch_http_curlm_check_multi_info(): 8000/30 ASYNC 200 http://eshost:9200/_bulk 250342/5439 14ms 12345ms

Jan 01 01:01:01	Date
http.c:369	File Name:Line Number
moloch_http_curlm_check_multi_info	Function Name
8000/30	8000 queued requests to server 30 connections to server
ASYNC	Asynchronous request, SYNC for Synchronous request
200	HTTP status code
http://eshost:9200/_bulk	Requested URL
250032/5439	250342 bytes uploaded (CURLINFO_SIZE_UPLOAD) 5439 bytes downloaded (CURLINFO_SIZE_DOWNLOAD)
14ms	14ms to connect to server (CURLINFO_CONNECT_TIME)
12345ms	12345ms total request time (CURLINFO_TOTAL_TIME)

Periodic Packet Progress

Jan 01 01:01:01 packet.c:1185 arkime_packet_log(): packets: 3911000000 current sessions: 41771/45251 oldest: 0 - recv: 4028852297 drop: 123 (0.00) queue: 1 disk: 2 packet: 3 close: 4 ns: 5 frags: 0/1988 pstats: 4132185901/1/2/3/4/5/6

Jan 01 01:01:01	Date
packet.c:1185	File Name:Line Number
arkime_packet_log	Function Name
packets: 3911000000	3911000000 packets are going to be processed by the packet queues. These packets have made it past corrupt checks, packet-drop-ips checks, and are ones we most likely understand.
current session: 41771/45251	41771 monitored sessions of the current session type (usually tcp) 45251 monitored sessions total
oldest: 0	In the current session type queue, the oldest session should be idled out in 0 seconds
recv: 4028852297	4028852297 packets have been received by the interface since capture start, as reported by the reader's stats api
drop: 123	123 packets have been dropped by the interface since capture started, as reported by the reader's stats api
(0.00)	0.00% packets have been dropped by the interface since capture started, as reported by the reader's stats api
queue: 1	1 bulk request is waiting to be sent to the OpenSearch/Elasticsearch servers, each bulk request may hold multiple sessions
disk: 2	2 disk buffers writes are outstanding, each buffer will hold multiple packets
packet: 3	3 packets are waiting to be processed in all the packet queues
close: 4	4 tcp sessions have been marked for closing (RST/FIN), waiting on last few packets
ns: 5	5 sessions are ready to be saved but there is a plugin that is doing async work, such as WISE
frags: 0/1988	always 0 1988 current ip frags waiting to be matched
pstats: 4132185901/1/2/3/4/5/6	4132185901 packets successfully sent to a packet queue 1 packet dropped because of packet-drop-ips config 2 packets dropped because the packet queues were overloaded 3 packets dropped because they were corrupt 4 packets dropped because how to process was unknown to us 5 packets dropped because of ipport rules 6 packets dropped because of packet deduping (2.7.1 enablePacketDedup)

Viewer

Where do I learn more about the expressions available

Click on the owl and read the Search Bar section. The Fields section is also useful for discovering fields you can use in a search expression.

Exported PCAP files are corrupt, sometimes session detail fails

The most common cause of this problem is that the timestamps between the Arkime machines are different. Make sure ntp is running everywhere, or that the time stamps are in sync.

Map counts are wrong

The source and destination IPs are each counted, so the map should total twice the number of sessions.
Currently OpenSearch/Elasticsearch only has accurate counts up to 2 billion uniques.
Some countries aren't shown, but can still be searched using their ISO-3 (< 1.0) or ISO-2 (>= 1.0).

What browsers are supported?

Recent versions of Chrome, Firefox, and Safari should all work fairly equally. Below are the minimum versions required. We aren't kidding.

Arkime Version	Chrome	Firefox	Opera	Safari	Edge	IE
Prior to 3.0	53	54	40	10	14	Not Supported
3.x, 4.x	80	74	67	13.1	80	Not Supported
5.x and beyond	92	95	78	15.4	92	Not Supported

Development and testing is done mostly with Chrome on a Mac, so it gets the most attention.

Error: getaddrinfo EADDRINFO

This seems to be caused when proxying requests from one viewer node to another and the machines don't use FQDNs for their hostnames and the short hostnames are not resolvable by DNS. You can check if your machine uses FQDNs by running the hostname command. There are several options to resolve the error:

Use the --host option on capture
Configure the OS to use FQDNs.
Make it so DNS can resolve the shortnames or add the shortnames to the hosts file.
Edit config.ini and add a viewUrl for each node. This part of the config file must be the same on all machines (we recommend you just use the same config file everywhere). Example:
```
[node1_eth0]
interface=eth0
viewUrl=http://node1.fqdn
[node1_eth1]
interface=eth1
viewUrl=http://node1.fqdn
[node2]
interface=eth1
viewUrl=http://node2.fqdn
```

How do I proxy Arkime using Apache

Apache, and other web servers, can be used to provide authentication or other services for Arkime when setup as a reverse proxy. When a reverse proxy is used for authentication it must be inline, and authentication in Arkime will not be used, however Arkime will still do the authorization. Arkime will use a username that the reverse proxy passes to Arkime as a HTTP header for settings and authorization. See the architecture page for diagrams. While operators will use the proxy to reach the Arkime viewer, the viewer processes still need direct access to each other.

If you are using SElinux in enforcing mode you may need to make changes for things to work, or disable SElinux. It has been reported that
```
setsebool -P httpd_can_network_connect 1
```
is required.
Install Apache, turn on the auth method of your choice. This example also uses HTTPS from Apache to Arkime, but if on localhost that isn't required. Configure it to set a special header for Arkime to check. In this example ARKIME_USER is the header that is being set from a variable, if your auth method already sets a header use that.
```
AuthType your_auth_method
Require valid-user
RequestHeader set ARKIME_USER %{your_auth_method_concept_of_username_variable_probably_REMOTE_USER}e
```

Make sure mod_ssl is loaded, and set up a SSL proxy:

SSLProxyEngine On
#ProxyRequests On # You probably don't want this line
ProxyPass        /arkime/ https://localhost:8005/ retry=0
ProxyPassReverse /arkime/ https://localhost:8005/

Restart Apache.
Using the Arkime UI (by going directly to a non proxy Arkime) make sure the "Web Auth Header" is checked for the users.
Edit Arkime's config.ini
- Create a new arkime-proxy section (you can use any name) for the Arkime proxy.
- Set userNameHeader to the lower case version of the header Apache is setting. NOTE - the userNameHeader setting is only needed on viewers that apache talks to, don't set on all of them.
- Set the webBasePath to the ProxyPath location used above. All other sections should NOT have a webBasePath.
- Add a viewHost=localhost, so externals can't just set the userNameHeader and access Arkime with no auth:
```
[arkime-proxy]
userNameHeader=arkime_user
webBasePath = /arkime/
viewPort = 8005
viewHost = localhost
```
Start the arkime-proxy viewer, so for this example you would need to add -n arkime-proxy to your systemd file (/etc/systemd/system/molochviewer.service by default) on the ExecStart line after viewer.js so viewer uses that section
To prevent the users from going directly to Arkime in the future, scramble their passwords. You might want to leave an admin user that doesn't use the Apache auth. Or you can temporarily add one with the addUser.js script.
If experiencing issues, try running viewer with --debug, add by editing the systemd file and restarting viewer

I still get prompted for password after setting up Apache auth

Make sure the user has the "Web Auth Header" checked
Make sure in the viewer config userNameHeader is the lower case version of the header Apache is using.
Run viewer.js with a --debug and see if the header is being sent.

How do I search multiple Arkime clusters

It is possible to search multiple Arkime clusters by setting up a special Arkime MultiViewer and a special MultiES process. The MultiES process is similar to Elasticsearch tribe nodes, except it was created before tribe nodes and can deal with multiple indices having the same name. The MultiViewer talks to MultiES instead of a real OpenSearch/Elasticsearch instance. Currently one big limitation is that all Arkime clusters must use the same serverSecret.

To use MultiES, create another config.ini file or section in a shared config file. Both multies.js and the special "all" viewer can use the same node name. See Multi Viewer Settings for more information.

# viewer/multies node name (-n allnode)
[allnode]
# The host and port multies is running on, set with multiESHost:multiESPort usually just run on the same host
elasticsearch=127.0.0.1:8200
# This is a special multiple arkime cluster viewer
multiES=true
# Port the multies.js program is listening on, elasticsearch= must match
multiESPort = 8200
# Host the multies.js program is listening on, elasticsearch= must match
multiESHost = localhost
# Semicolon list of OpenSearch/Elasticsearch instances, one per arkime cluster.  The first one listed will be used for settings
# You MUST have a name set
multiESNodes = http://escluster1.example.com:9200,name:escluster1,prefix:PREFIX;http://escluster2.example.com:9200,name:escluster2
# Uncomment if not using different rotateIndex settings
#queryAllIndices=false

Now you need to start up both the multies.js program and viewer.js with the same config file AND -n allnode. All other viewer settings, including webBasePath can still be used.

By default, the users table comes from the first cluster listed in multiESNodes. This can be overridden by setting usersElasticsearch and optionally usersPrefix in the multi viewer config file.

How do I use self-signed SSL/TLS Certificates with MultiES?

Since 4.2.0 MultiES supports the caTrustFile setting.

Priority to 4.2.0 you will need to create a file, for example CAcerts.pem, containing one or more trusted certificates in PEM format.

Then, you need start MutilES adding NODE_EXTRA_CA_CERTS environment variable specifying the path to file you just created, for example:

NODE_EXTRA_CA_CERTS=./CAcerts.pem /opt/arkime/bin/node multies.js -c /opt/arkime/etc/config.ini -n allnode

How do I reset my password?

An admin can change anyone's password on the Users tab by clicking the Settings link in the Actions column next to the user.

A password can also be changed by using the addUser script, which will replace the entire account if the same userid is used. All preferences and views will be cleared, so creating a secondary admin account may be a better option if you need to change an admin users password. After creating a secondary admin account, change the users password and then delete the secondary admin account.

node addUser -c <configfilepath> <user id> <user friendly name> <password> [--admin]

Error: Couldn't connect to remote viewer, only displaying SPI data

Viewers have the ability to proxy traffic for each other. The ability relies on Arkime node names that are mapped to hostnames. Common problems are when systems don't use FQDNs or certs don't match.

How do viewers find each other

First the SPI records are created on the capture side.

Each capture gets a nodename, either by the -n command line option or everything in front of the first period of the hostname.
Each capture writes a stats record every few seconds that has the mapping from the nodename to the FDQN. It is possible to override the FDQN with the --host option to capture.
Each SPI record has a nodename in it.

When PCAP is retrieved from a viewer it uses the nodename associated with the SPI record to find which capture host to connect to.

Each arkime-viewer process gets a nodename, either by the -n command line option or everything in front of the first period of the hostname.
If the SPI record nodename is the same as the arkime-viewer nodename it can be processed locally, STOP HERE. This is the common case with one arkime node.
If the stats[nodename].hostname is the same as the arkime-viewer's hostname (exact match) then it can be processed locally, STOP HERE. Remember this is written by capture above, either the FQDN or --host. This is the common case with multiple capture processes per capture node.
If we make it here, the PCAP data isn't local and it must be proxied.
If there is a viewUrl set in the [nodename] section, use that.
If there is a viewUrl set in the [default] section, use that.
Use stats[nodename].hostname:[nodename section - viewPort setting]
Use stats[nodename].hostname:[default section - viewPort setting]
Use stats[nodename].hostname:8005

Possible fixes

First, look at viewer.log on both the viewer machine and the remote machine and see if there are any obvious errors. The most common problems are:

Not using the same config.ini on all nodes can make things a pain to debug and sometimes not even work. It is best to use the same config with different sections for each node name [nodename]
The remote machine doesn't return a FQDN from the hostname command AND the viewer machine can't resolve just the hostname. To fix this, do ONE of the following:
1. Use the --host option to capture and restart capture
2. Make it so the remote machines returns a FQDN (hostname "fullname" as root and edit /etc/sysconfig/network)
3. Set a viewUrl in each node section of the config.ini. If you don't have a node section for each host, you'll need to create one.
4. Edit /etc/resolv.conf and add search foo.example.com, where foo.example.com is the subdomain of the hosts. Basically, you want it so "telnet shortname 8005" works on the viewer machine to the remote machine.
The remote machine's FQDN doesn't match the CN or SANs in the cert it is presenting. The fixes are the same as #2 above.
The remote machine is using a self signed cert. To fix this, either turn off HTTPS or see the certificate answer above.
The remote machine can't open the PCAP. Make sure the dropUser user or dropGroup group can read the PCAP files. Check the directories in the path too.
Make sure all viewers are either using HTTPS or not using HTTPS, if only some are using HTTPS then you need to set viewUrl for each node.
1. When troubleshooting this issue, it is sometimes easier to disable HTTPS everywhere
If you want to change the hostname of a capture node:
1. Change your mind :)
2. Reuse the same node name as previously with a -n option
3. Use the viewUrl for that old node name that points to the new host.

Compiled against a different Node.js version error

Arkime uses Node.js for the viewer component, and requires many packages to work fully. These packages must be compiled with and run using the same version of Node.js. An error like … was compiled against a different Node.js version using NODE_MODULE_VERSION 48. This version of Node.js requires NODE_MODULE_VERSION 57. means that the version of Node.js used to install the packages and run the packages are different.

This shouldn't happen when using the prebuilt Arkime releases. If it does, then double check that /opt/arkime/bin/node is being used to run viewer.

If you built Arkime yourself, this usually happens if you have a different version of node in your path. You will need to rebuild Arkime and either:

Remove the OS version of node
Make sure /opt/arkime/bin is in your path before the OS version of node
Use the --install option to easybutton which will add to the path for you

How do I change the port viewer listens on?

By default viewer listens on port 8005. Changing this can be tricky, especially for a port less than 1024, like 443. You should definitely read the How do viewers find each other section.

Scenario	Solutions
Change all nodes to port > 1024	Set viewPort in [default] section on ALL nodes
Change single node port < 1024, remaining nodes (if any) unchanged	Usually unless a program runs as root it can NOT listen to ports less than 1024. Since viewer by default drops privileges before listening, even if you start as root, it isn't root anymore when trying to listen on the port. Possible solutions are: Use a reverse proxy like Apache/Nginx. This is a great option for a central node that needs to be behind SSO, and all other nodes are blocked from users directly using Use iptables to forward from new port to 8005. Something like `iptables -t nat -I PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8005` Fool around with the systemd CAP_NET_BIND_SERVICE setting Comment out the dropUser setting and change the viewPort setting in a [$nodename] section.
All nodes, port < 1024	Just don't. :) If you must, most of the solutions above will work, but don't do the reverse proxy solution since viewer nodes need to talk to each other WITHOUT external authentication.

Hunts are not working

Hunts require a single viewer node be specified to actually coordinate the hunts. Since 4.3.1 the preferred method is to set cronQueries=auto in the [default] section of the config.ini file for nodes that should be eligible to run hunts. This will automatically select a single node to run hunts, and if that node goes down, another node will be selected. If using central viewers, it is recommended to only set cronQueries=auto on the central viewers.

You can also force hunts to run on a specific node by setting cronQueries=true in the [default] section of the config.ini file. You must only set cronQueries=true on one and only one node.

If cronQueries is properly set up on a single node, and hunts still aren't working, make sure the cronQueries node is running and checking in. You can check this on the Stats -> ES Nodes tab and/or check the viewer logs.

See more information about the cronQueries setting here.

Fields are missing or showing an error

Arkime stores which fields are available in the arkime_fields index. The index is populated by db.pl when the Arkime cluster is created and by capture. New fields are added with new versions of Arkime, new plugins, wise or the custom fields feature. Arkime rarely deletes fields from the index, except for major reindexing.

The viewer application checks this index on start up and periodically for any changes. It can take around 10 minutes for new fields to show up in the web application since the viewer caches the fields.

If fields are missing the solution is almost always

Run capture so it has a chance to re-add any missing fields
Restart viewer or wait 10 minutes
Reload the web page so it is forced to refetch the fields from viewer

Cont3xt

Cont3xt is not working

Here is the common check list:

Check that Cont3xt integrations are configured and not disabled
```
curl http://localhost:3218/settings#integrations
```
Verify there are no errors in the /opt/arkime/logs/cont3xt.log file.
Restart cont3xt after adding a debug=2 in the [cont3xt] section may print out useful information what is wrong.

Parliament

Sample Apache Config

Parliament is designed to run behind a reverse proxy such as Apache. Basically, you just need to tell Apache to send all root requests and any /parliament requests to the Parliament server.

ProxyPassMatch   ^/$ http://localhost:8008/parliament retry=0
ProxyPass        /parliament/ http://localhost:8008/parliament/ retry=0

WISE

WISE is not working

Here is the common check list:

Check that WISE is running
```
curl http://localhost:8081/fields
```
You should see a list of fields that WISE knows about.
Check in your config.ini file you've added
- wise.so to the plugins= line.
- wise.js to the viewerPlugins= line.
- wiseURL has been set, or the older wiseHost and wisePort
Check that from the capture/viewer hosts you can reach the viewer hosts and there are no ACL issues.
```
curl http://WISEHOST:8081/fields
```
Restart capture after adding a --debug option may print out useful information what is wrong. Look to make sure that WISE is being called with the correct URL. Verify that the plugins, wiseHost and wiseURL setting is what you actually think it is.

arkime.com

How can I contribute?

Want to add or edit this FAQ? Found an issue on this site? This site's code is open source. Please contribute!

General

Why should I use Arkime?

How do you pronounce our name?

Upgrading Arkime

What operating systems are supported?

Arkime is not working

How do I reset Arkime?

Self-Signed or Private CA TLS Certificates

How do I upgrade to Moloch 1.x?

How do I upgrade to Moloch 2.x?

How do I upgrade to Arkime 3.x?

Breaking Changes

Instructions

How do I upgrade to Arkime 4.x?

Breaking Changes

Instructions

How do I upgrade to Arkime 5.x?

Breaking Changes

Instructions

How do I upgrade to Arkime 6.x?

Breaking Changes

Instructions

OpenSearch/Elasticsearch

How many OpenSearch/Elasticsearch nodes or machines do I need?

Data never gets deleted

ERROR - Dropping request

When do I add additional nodes? Why are queries slow?

Removing Nodes

How to enable OpenSearch/Elasticsearch replication

How do I upgrade OpenSearch/Elasticsearch?

How do I upgrade to Elasticsearch 9.x?

How do I upgrade to Elasticsearch 8.x?

How do I upgrade to Elasticsearch 7.x?

How do I upgrade to Elasticsearch 6.x?

How do I upgrade to Elasticsearch 5.x?

version conflict, current version [N] is higher or equal to the one provided [M]

Recommended OpenSearch/Elasticsearch Settings

Disk Watermark

Shard Limit

Write Queue Limit

HTTP Compression

Recovery Time

Logging

Using ILM/ISM with Arkime

Capture

What kind of capture machines should I buy?

What kind of NPB should I buy?

What kind of packet capture speeds can arkime-capture handle?

Arkime requires full packet captures error

Why am I dropping packets? (and Disk Q issues)

Arkime Version

Kernel and TPACKET_V3 support

Network Card Config

MTU Setting

packetThreads and the PacketQ is overflowing error

Disk and Disk Q issues

Other

WISE

High Performance Settings

How do I import existing PCAPs?

Enable Arkime UI to upload

How do I monitor multiple interfaces?

Arkime capture crashes

ERROR - pcap open failed

How to reduce amount of traffic/pcap?

Life of a packet

PCAP Deletion

dontSaveBPFs doesn't work

Zero or missing bytes PCAP files

Can I virtualize Arkime with KVM using OpenVswitch?

Installing MaxMind Geo free database files

What do these log lines mean?

HTTP Responses

Periodic Packet Progress

Viewer

Where do I learn more about the expressions available

Exported PCAP files are corrupt, sometimes session detail fails

Map counts are wrong

What browsers are supported?

Error: getaddrinfo EADDRINFO

What kind of packet capture speeds can `arkime-capture` handle?