If you're in search of a comprehensive, standalone open-source solution for full packet capture (FPC) and network analysis that includes both metadata parsing and searching capabilities, Arkime stands out as the premier choice. Full packet capture systems are indispensable tools for network and security analysts, offering an unfiltered view of network activities and enabling a detailed analysis of events from a network perspective. As an open-source platform, Arkime affords its users total control over both deployment and architecture, ensuring that you can tailor the system to meet your specific requirements. While there are other FPC systems available, Arkime's unique blend of features and open-source accessibility makes it a standout solution for those needing detailed and actionable network insights.
(/ɑːrkɪˈmi/) Read more about why we changed our name here.
Upgrading Arkime is a sequential process that requires installing each major version in the order outlined in the table below.
This step-by-step approach ensures that your system transitions smoothly between versions without missing critical updates or features.
If you find that your current version of Arkime is not explicitly mentioned in the chart, it is recommended to upgrade to the immediately higher version listed.
By following this method, you can sequentially upgrade through the major releases until your system is up-to-date.
Typically, transitioning between versions requires the execution of the db.pl
upgrade script.
This script is designed to update your database schemas and configurations to be compatible with the newer version of Arkime.
Unless specified otherwise in the upgrade documentation, running this script should be the only additional step needed to complete the upgrade process.
Name | Version |
Min Version to Upgrade From |
OpenSearch Versions |
Elasticsearch Versions |
Special Instructions | Notes |
---|---|---|---|---|---|---|
Arkime | 5.0+ | 4.3.2+ (4.6.0 recommended) | 1.0.0+ (2.10+ recommended) | 7.10+, 8+ (7.17+, 8.10+ recommended) | Arkime 5.x instructions | |
Arkime | 4.0+ | 3.3.0+ (3.4.0 recommended) | 1.0.0+ (2.3+ recommended) | 7.10+, 8+ | Arkime 4.x instructions | |
Arkime | 3.0+ | 2.4.0 | 1.0.0+ (2.3 recommended) | 7.10+, not 8.x | Arkime 3.x instructions | |
Arkime | 2.7+ | 2.0.0 | N/A | 7.4+ (7.9.0+ recommended, 7.7.0 broken) | Elasticsearch 7 instructions | |
Moloch | 2.2+ | 1.7.0 (1.8.0 recommended) | N/A | 6.8.2+ (6.8.6+ recommended), 7.1+ (7.8.0+ recommended, 7.7.0 broken) | Moloch 2.x instructions | Must already be on 6.8.x or 7.1+ before upgrading to 2.2 |
Moloch | 2.0, 2.1 | 1.7.0 (1.8.0 recommended) | N/A | 6.7, 6.8, 7.1+ | Moloch 2.x instructions | Must already be on Elasticsearch 6.7 or 6.8 (Elasticsearch 6.8.6 recommended) before upgrading to 2.0 |
Moloch | 1.8 | 1.0.0 (1.1.x recommended) | N/A | 5.x or 6.x | Elasticsearch 6 instructions | Must have finished the 1.x reindexing; stop captures for best results |
Moloch | 1.1.1 | 0.20.2 (0.50.1 recommended) | N/A | 5.x or 6.x (new only) | Instructions | Must be on Elasticsearch 5 already |
Moloch | 0.20.2 | 0.18.1 (0.20.2 recommended) | N/A | 2.4, 5.x | Elasticsearch 5 instructions |
Arkime is pre packaged to support a wide range of operating systems, available on the downloads page. The Arkime development team predominantly works with the EL 8 build, opting between the pcap and afpacket readers based on the specific needs of each deployment. You are encouraged to use the afpacket reader whenever possible to achieve the best capture performance. While a substantial portion of our development efforts takes place on macOS, leveraging the Homebrew package manager, it's important to note that this environment has not been vetted in a production setting. :) It's also worth mentioning that Arkime has phased out support for 32-bit machines, consequently, the software is incompatible with many lower-powered devices. Furthermore, our support is currently limited to LTS versions of Ubuntu, due to potential library compatibility issues with non-LTS releases.
The following operating systems distributions and versions are supported directly:
Here is the common checklist to perform when diagnosing a problem with Arkime (replace /opt/arkime with /data/moloch for Moloch builds):
curl http://localhost:9200/_cat/health
on the machine running OpenSearch/Elasticsearch.
An Unauthorized response probably means that you need user:pass in all OpenSearch/Elasticsearch URLs or that you are using the wrong URL.
/opt/arkime/db/db.pl http://elasticsearch.hostname:9200 info
command. You should see information about the database version and number of sessions.
http://arkime-viewer.hostname:8005
from your browser.
viewHost=localhost
is NOT set in the config.ini file. Test that curl http://IP:8005
works from the host viewer is running on.
/opt/arkime/logs/viewer.log
and that viewer is running with the pgrep -lf viewer
command. If the UI looks strange or isn't working, viewer.log
will usually have information about what is wrong.
/opt/arkime/logs/capture.log
and that capture is running with the pgrep -lf capture
command. If packets aren't being processed or other metadata generation issues, capture.log
will usually have information about what is wrong and links to the FAQ on how to fix.
http://arkime-viewer.hostname:8005/stats?statsTab=1
in your browser.
capture.log
capture.log
capture.log
bpf=
in /opt/arkime/etc/config.ini
. If that fixes the issue, read the
BPF FAQ answer.
/opt/arkime/etc/config.ini
to shorten the timeouts.
curl http://elasticsearch.hostname:9200/_refresh
command.
capture
after turning on debugging,
either add --debug
to the start line or add debug=1
in the [default]
section of your config.ini file.
You can add multiple --debug
options or set debug=
to a larger number to get even more information.
Capture will print out the configuration settings it is using; verify that they are what you expect.
--debug
to the start line or add debug=1
in the [default]
section of your config.ini file.
You can add multiple --debug
options or set debug=
to a larger number to get even more information.
Viewer will print out the configuration settings it is using; verify that they are what you expect.
/opt/arkime/etc/config.ini
and readable by the
viewer process.
grep moloch_packet_log /opt/arkime/logs/capture.log | tail
Verify that the packets number is greater than 0. If not, then no packets were processed.
Verify that the first pstats number is greater than 0. If not, Arkime didn't know how to decode any packets.
db.pl
script with either the init
or
wipe
commands. The only difference between the two
commands is that wipe
leaves the
added users so that they don't need to be re-added.
/opt/arkime/db/db.pl http://ESHOST:9200 wipe
/bin/rm -f /opt/arkime/raw/*
The core Arkime team advises against the use of self-signed certificates, despite their technical feasibility. Instead, we encourage leveraging the financial benefits derived from using Arkime over commercial full packet capture solutions to invest in legitimate certificates. The cost of wildcard certificates has decreased significantly, making them an affordable option. Alternatively, free certificates from Let's Encrypt represent a viable option. While members of the Arkime Slack workspace may offer assistance, the core development team typically refers queries back to this guidance. It's important to note that private CA certificates face similar challenges and require analogous solutions as those encountered with self-signed certificates.
One of the simplest methods to bypass the hurdles associated with self-signed certificates involves adding them to your operating system's list of recognized certificates or chains.
The process for doing so varies widely across different OS distributions and versions, making a quick internet search the most efficient strategy to find specific instructions.
It may be necessary to register the certificate across multiple trust stores due to the varied certificate validation locations utilized by node (for the viewer), curl (for capture), and perl (for db.pl).
The Viewer component of Arkime includes a caTrustFile
option, a feature introduced by contributors to the project.
Since version 4.2.0, all components of Arkime are designed to support the caTrustFile
configuration.
An alternative, though less secure option, involves disabling certificate verification entirely.
Key components such as Capture, Viewer, arkime_add_user.sh, and db.pl accept the --insecure
flag to bypass certificate checks.
This flag must be appended to the startup commands for both capture and viewer services.
In newer versions of Arkime this can be done by
cp /opt/arkime/etc/env.sample /opt/arkime/etc/capture.env
echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/capture.env
cp /opt/arkime/etc/env.sample /opt/arkime/etc/viewer.env
echo 'OPTIONS="--insecure"' >> /opt/arkime/etc/viewer.env
You can edit /opt/arkime/etc/capture.env and /opt/arkime/etc/viewer.env to add other options.
We recommend using env files instead of editing the systemd files since they may get overwritten by new installs.
Moloch 1.x has some large changes and updates that will require all session data to be reindexed. The reindexing is done in the background AFTER upgrading, so there is little downtime. Large changes in 1.0 include the following:
ip
type.
If you have any special parsers, taggers, plugins, or WISE sources, you may need to change configurations.
To upgrade:
/data/moloch/bin/moloch_update_geo.sh
on all capture
nodes that will download the new mmdb style maxmind files.
db.pl http://ESHOST:9200 upgrade
once.
Once 1.1.1 is working, you need to reindex the old session data:
/data/moloch/viewer
directory, run
/data/moloch/viewer/reindex2.js --slices X
.
curl -XDELETE 'http://localhost:9200/sessions-*'
Upgrading to Moloch 2.x is a multistep process that requires an outage. An outage is required because all the captures must be stopped before upgrading the database so that there are no schema issues or corruption. Most of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with 6.7 or 6.8. This is very important when upgrading to Elasticsearch 7.x or later.
./db.pl http://ESHOST:9200 backup pre20
to back up all administrative indices.
./db.pl http://ESHOST:9200 upgrade
.Upgrading to Arkime 3.x is a multistep process that requires an outage. An outage is required because all the captures MUST be stopped before upgrading the database so that there are no schema issues or corruption. Do not restart the capture processes until the db.pl upgrade has finished! All of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with version 7. This is very important when upgrading to Elasticsearch 8.x or later.
arkime_
yet.db.pl ilm
command again after upgrading. ./db.pl http://ESHOST:9200 backup pre30
to back up all administrative indices. ./db.pl http://ESHOST:9200 upgrade [other options]
, and don't forget to include any other options you usually use, like --replicas or --ilm.db.pl ilm
command again with all the same options that were used previously.Upgrading to Arkime 4.x requires that you are already using Arkime 3.3.0 or later. Arkime 4.x uses a new permissions model with roles.
--roles
option or the --admin
sets the superAdmin
role
./db.pl http://ESHOST:9200 upgrade [other options]
, and don't forget to include any other options you usually use, like --replicas or --ilm.Upgrading to Arkime 5 requires that you are already using Arkime 4.3.2 or later.
/opt/arkime/db/db.pl http://ESHOST:9200 upgrade [other options]
, replace ESHOST and don't forget to include any other options you usually use, like --replicas or --ilm.Arkime is designed to work seamlessly with both OpenSearch and Elasticsearch, underscoring our commitment to support both platforms moving forward. While some of the documentation and configurations might exclusively mention Elasticsearch, it's important to note that OpenSearch is compatible with Arkime versions that support Elasticsearch 7 and above. As the two systems continue to evolve separately, there might be new features introduced that are specific to either OpenSearch or Elasticsearch. Rest assured, Arkime is committed to remaining accessible without the necessity for any paid features of Elasticsearch, though we may choose to provide optional support for such features.
The answer, of course, is "it depends." Factors include:
The following are some important things to remember when designing your cluster:
We have some estimators that may help.
The good news is that it is easy to add new nodes in the future, so feel free to start with fewer nodes.
As a temporary fix for capacity problems, you can reduce the number of days of metadata that are stored.
You can use the Arkime ES Indices tab to delete the oldest sessions2
or sessions3
index.
The SPI data in OpenSearch/Elasticsearch and the PCAP data are not deleted at the same time.
The PCAP data is deleted as the disk fills up on the capture machines. See here for more information.
PCAP deletion happens automatically, and nothing needs to be done.
The SPI data is either deleted by using ILM or when the ./db.pl expire
command is run, usually from cron during off peak.
Unless you use ILM, the SPI data deletion does NOT happen automatically, and a cron job MUST be set up.
A cron setup that only keeps 90 days of data and expires at midnight might look like this:
0 0 * * * /opt/arkime/db/db.pl http://localhost:9200 expire daily 90
So deleting a PCAP file will NOT delete the SPI data, and deleting the SPI data will not delete the PCAP data from disk.
The UI does have commands to delete and scrub individual sessions, but
the user must have the Remove Data
ability on the users tab.
This feature is used for things you don't want operators to see, such as bad images,
and not as a general solution for freeing disk space.
This error means that your OpenSearch/Elasticsearch cluster can not keep up with the number of sessions that the capture nodes are trying to send it or there are too many messages being sent. You may only see the error message on your busiest capture nodes because capture tries to buffer the requests.
Check the following:
--replicas 1
with your daily
./db.pl expire
run after turning off replication in the sessions template using
./db.pl upgrade
without the
--replicas
option.
./db.pl upgrade
again.
If these don't help, you need to add more nodes or reduce the number of sessions being monitored. You can reduce the number of sessions with packet-drop-ips, bpf filters, or rules files, for example.
If queries are too slow, the easiest fix is to add additional OpenSearch/Elasticsearch nodes.
OpenSearch/Elasticsearch doesn't perform well if Java hits an OutOfMemory condition.
If you ever have one, you should immediately delete the oldest *sessions*
index,
update the daily expire cron to delete more often, and restart the OpenSearch/Elasticsearch cluster.
Then you should order more machines. :)
If you need to remove nodes from your OpenSearch/Elasticsearch cluster, follow these steps as an Admin:
If no shards are relocated, you might need to adjust the OpenSearch/Elasticsearch settings to permit the allocation of at least two shards per node. This is particularly crucial if you're in the process of removing several nodes from the cluster. A higher number of shards per node may be necessary to facilitate this process effectively.
curl -XPUT 'localhost:9200/sessions*/_settings' -d '{
"index.routing.allocation.total_shards_per_node": 2
}'
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{
"indices.recovery.concurrent_streams":6,
"indices.recovery.max_bytes_per_sec":"50mb"}
}'
Turning on replication will consume twice the disk space on the nodes and increase the network bandwidth between nodes.
To change future days, run the following command:
db/db.pl <http://ESHOST:9200> upgrade --replicas 1
To change past days but not the current day, run the following command:
db/db.pl <http://ESHOST:9200> expire <type> <num> --replicas 1
We recommend the second solution because it allows current traffic to be written to OpenSearch/Elasticsearch once, and during off peak the previous day's traffic will be replicated.
In general, if upgrading between minor or build versions of Elasticsearch, you can perform a rolling upgrade with no issues. Follow Elastic's instructions for the best results. Make sure you select the matching version of that document for your version of Elasticsearch from the dropdown menu on the right side of the screen.
Upgrading between major versions of Elasticsearch usually requires an upgrade of Arkime. See the following instructions:
Elasticsearch 8.x is NOT supported before 3.4.1, and we recommend that you use Arkime 4.x.
db.pl http://ESHOST:9200 upgrade
while still using Elasticsearch 7.
db.pl http://ESHOST:9200 info
or mouse over the in Arkime.
indices.breaker.total.limit
, you should unset it.curl http://localhost:9200/_cat/health
Elasticsearch 6.x is supported by Moloch 1.x for NEW clusters and >= 1.5 for UPGRADING clusters.
NOTE – If upgrading, you must FIRST upgrade to Moloch 1.0 or 1.1 (1.1.1 is recommended) before upgrading to > 1.5. Also, all reindex operations need to be finished.
We do NOT provide Elasticsearch 6 startup scripts or configuration, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.
Upgrading to Elasticsearch 6 will REQUIRE two downtimes.
First outage: If you are NOT using Moloch DB version 51 (or later), you
must follow these steps while still using Elasticsearch 5.x. To find what DB version
you are using, either run
db.pl http://ESHOST:9200 info
or mouse over the in Moloch.
./db.pl http://ESHOST:9200 upgrade
.
Second outage: Upgrade to Elasticsearch 6.
/data/foo/<clustername>
.
curl http://localhost:9200/_cat/health
Elasticsearch 5.x is supported by Moloch 0.17.1 for NEW clusters and 0.18.1 for UPGRADING clusters.
Elasticsearch 5.0.x, 5.1.x, and 5.3.0 are NOT supported because of Elasticsearch bugs/issues. We currently use 5.6.7.
WARNING – If you have sessions-*
indices
created with Elasticsearch 1.x, you can NOT upgrade. Those indices will need to be
deleted.
We do NOT provide Elasticsearch 5 startup scripts, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.
Upgrading to Elasticsearch 5 may REQUIRE 2 downtime periods of about 5–15 minutes each.
First outage: If you are NOT using Moloch DB version 34 (or later), you
must follow these steps while still using Elasticsearch 2.4. To find what DB version
you are using, either run db.pl http://ESHOST:9200 info
or mouse over the in Moloch.
curl http://localhost:9200/_cat/health
./db.pl http://ESHOST:9200 upgrade
.
Second outage: Upgrade to Elasticsearch 5.
curl http://localhost:9200/_cat/health
This error usually happens when the capture process is trying to update the stats data and falls behind.
Arkime will continue to function while this error occurs with the stats or dstats index; however, it does usually mean that your Elasticsearch cluster is overloaded.
You should consider increasing your Elasticsearch capacity by adding more nodes, CPU, and/or more memory.
If increasing Elasticsearch capacity isn't an option, then reduce the amount of traffic that Arkime processes.
If the N vs. M version numbers are very different from each other, it usually means that you are running two nodes with the same node name at the same time, which is not supported.
Here are some of our recommended OpenSearch/Elasticsearch settings. Many of these
can be updated on the fly, but it is still best to put them in your
elasticsearch.yml
file. We strongly recommend using the same
elasticsearch.yml
file on all hosts. Things that need to be
different per host can be set with variables.
You will probably want to change the watermark settings so you can use
more of your disk space. You have the option to use ALL percentages or
ALL values, but you can't mix them. The most common sign of a problem
with these settings is an error that has
FORBIDDEN/12/index read-only / allow delete
in it. You can
use
./db.pl http://ESHOST:9200 unflood-stage _all
to clear the error once you adjust the settings and/or delete some
data.
Elasticsearch Docs
cluster.routing.allocation.disk.watermark.low: 97%
cluster.routing.allocation.disk.watermark.high: 98%
cluster.routing.allocation.disk.watermark.flood_stage: 99%
Or, if you want more control, use values instead of percentages:
cluster.routing.allocation.disk.watermark.low: 300gb
cluster.routing.allocation.disk.watermark.high: 200gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb
If you have a lot of shards that you want to be able to search against at once Elasticsearch Docs
action.search.shard_count.limit: 100000
No longer need to set since Elasticsearch 7.9.
If you hit a lot of bulk failures, this can help, but Elastic doesn't
recommend raising too much. In older versions of Elasticsearch, it is
named thread_pool.bulk.queue_size
, so check the docs for
your version.
Elasticsearch Docs
thread_pool.write.queue_size: 10000
On by default in most versions, allows for HTTP compression. Elasticsearch Docs
http.compression: true
To speed up recovery times and startup times, there are a few controls to experiment with. Make sure you test them in your environment and slowly increase them because they can break things badly. Elasticsearch Allocation Docs and Elasticsearch Recovery Docs
cluster.routing.allocation.cluster_concurrent_rebalance: 10
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.node_initial_primaries_recoveries: 5
indices.recovery.max_bytes_per_sec: "400mb"
By default, Elasticsearch has logging set to debug level in log4j2.properties. For busy clusters, change this to info level to lower CPU and disk usage.
logger.action.level = info
Since Moloch 2.2, you can easily use ILM to move indices from hot to warm, force merge, and delete.
We recommend only using ILM with newer versions (7.2+) of Elasticsearch because older versions had some issues.
Once ILM is enabled, you no longer have to use the db.pl expire
cron job but should occasionally run db.pl optimize-admin
.
ILM is only included in the free "basic" Elasticsearch license, so it is not part of the Elasticsearch OSS distribution, and you may need to upgrade. Arkime does NOT currently support the ILM auto rollover feature, for performance reasons, when searching.
These instructions assume you are using db.pl or Arkime UI to set up ILM and will use a special molochtype
attribute name.
You can also do this with Kibana to create the ILM config and not use the molochtype
attribute name, but you will then need to do everything on your own.
In order for ILM work correctly with Arkime, follow these five important steps:
node.attr.molochtype: warm
or node.attr.molochtype: hot
db.pl ilm
command.db.pl ilm
can perform this action.db.pl upgrade ... --ilm
and add --ilm
to the command. Also add --hotwarm
if using a hot/warm design.db.pl expire
cron job with db.pl optimize-admin
So for example, to create a new policy that keeps 30 weeks of history, 90 days of SPI data, 1 replica, and optimizes all indices older than 25 hours, you would run:
./db.pl http://localhost:9200 ilm 25h 90d --history 30 --replicas 1
You would then need to run upgrade with all the arguments you usually use, plus --ilm:
./db.pl http://localhost:9200 upgrade --replicas 1 --shards 5 --ilm
Arkime prioritizes cost-effective solutions by utilizing standard hardware components. Consider carefully before opting for upgrades like SSDs or high-end network cards. In some cases, adding another server might be a more economical choice. This approach not only reduces individual machine costs but also enhances overall data retention capacity.
Choosing the right machine for the job requires careful consideration. Here are some key factors to keep in mind:
When selecting Arkime capture boxes, standard "Big Data" boxes might be the best bet ($10k–$25k each). Look for:
We are big fans of using network packet brokers (NPBs) ($6k+). They allow
multiple taps/mirrors to be aggregated and load balanced across multiple
capture
machines. Read more in the following sections.
We are big fans of using NPBs, and we recommend that medium or large Arkime deployments use an NPB. See MolochON 2017 NPB Presentation .
Main advantages:
Features to look for:
Just like with Arkime, with commodity hardware, you don't necessarily have to pay a lot of money for a good NPB. Some switch vendors offer switches that can operate in switch mode or NPB mode, so you might already have gear you can use.
Sample vendors
arkime-capture
handle?
On basic commodity hardware, achieving throughput of 3 Gbps or more is easy, depending largely on the number of CPUs allocated to Arkime and the other tasks the machine is handling. Often, the bottleneck in performance is the speed of the disks and the RAID configuration. For further details, refer to the Architecture and Multiple Host sections. Arkime supports the utilization of multiple threads for both packet acquisition and packet processing.
To test the local RAID device, use:
dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
To test a NAS, leave off the oflag=direct and make sure you test with at least 3x the amount of memory so that cache isn't a factor:
dd bs=256k count=150000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test
The output represents the maximum disk performance. If you wish to obtain a more accurate assessment, run several tests and average the results. To avoid packet loss, it's advisable to operate Arkime at no more than approximately 80% of the maximum disk performance. For systems utilizing RAID, aiming for about 60% of this performance metric can further minimize the risk of dropping packets, especially during RAID rebuilds. It's important to note that network throughput is typically measured in bits, whereas disk performance is gauged in bytes, requiring the conversion of these measurements for accurate comparison.
When you get an error about the capture length not matching the packet length, it is NOT an issue with Arkime. The issue is with the network card settings OR how the pcap file was created.
By default modern network cards offload work that the CPUs would need to do. They will defragment packets or reassemble tcp sessions and pass the results to the host. However this is NOT what we want for packet captures, we want what is actually on the network. So you will need to configure the network card to turn off all the features that hide the real packets from Arkime.
The sample config files
(/opt/arkime/bin/arkime_config_interfaces.sh
) turn off many
common features but there are still some possible problems:
ethtool -k INTERFACE | grep on
— Anything that is
still on, turn off and see if that fixes the problem.
Items that says [fixed]
can NOT be disabled with ethtool.
ethtool -K INTERFACE tx off sg off gro off gso off lro off tso off
Workarounds:
-s 0
.readTruncatedPackets=true
in the config file, most protocol parsing in Arkime will not work.readTruncatedPackets=true
in the config file.snapLen=65536
in the config file, this is not recommended.There are several different types of packet drops and reasons for packet drops:
Please make sure you are using a recent version of Arkime. Constant improvements are made and it is hard for us to support older versions.
The most common cause of packet drops with Arkime is leaving the reader default of libpcap instead of switching to tpacketv3, pfring or one of the other high performance packet readers. We strongly recommend tpacketv3. See plugin settings for more information.
Make sure the network card is configured correctly by increasing the ring buf to max size and turning off most of the card's features. The features are not useful anyway, since we want to capture what is on the network instead of what the local OS sees. Example configuration:
# Set ring buf size, see max with ethool -g eth0
ethtool -G eth0 rx 4096 tx 4096
# Turn off feature, see available features with ethtool -k eth0
ethtool -K eth0 rx off tx off gs off tso off gso off
If Arkime was installed from the deb/rpm and the Configure script was
used, this should already be done in
/data/moloch/bin/moloch_config_interfaces.sh
The packetThreads config option controls the number of threads processing
the packets, not the number of threads reading the packets off the network card.
You only need to change the value if you are getting the Packet Q is overflowing
error.
The packetThreads option is limited to 24 threads, but usually you only need a few.
Configuring too many packetThreads is actually worse for performance, please start with a lower number and slowly increase.
You can also change the size of the packet queue by increasing the maxPacketsInQueue setting.
To increase the number of threads the reader uses please see the documentation for the reader you are using on the settings page.
In general errors about the Disk Q being exceeded are NOT a problem with Arkime, but usually an issue with either the hardware or the packet rate exceeding what the hardware can save to disk. You will usually need to either fix/upgrade the hardware or reduce the amount of traffic being saved to disk.
dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
This is the MAX disk performance. Run several times if desired and take
the average. If you don't want to drop any packets, you shouldn't average
more then ~80% of the MAX disk performance. If using RAID and don't want
drop packets during a future rebuild, ~60% is a better value. Remember
that most network numbers will be in bits while the disk performance will
be in bytes, so you'll need to adjust the values before comparing.
xfs
make sure you use mount options defaults,inode64,noatime
If using EMC for disks:
To check your disk IO run iostat -xm 5
and look at the
following:
avgqu-sz
should be near or less then 1, otherwise linux is
queueing instead of doing
capture
down.
Other things to do/check:
arcconf SETCACHE 1 LOGICALDRIVE 1 WBB
hpssacli ctrl slot=0 modify dwc=enable
MegaCli64 -LDSetProp -ForcedWB -Immediate -Lall -aAll ; MegaCli64 -LDSetProp Cached -L0 -a0 -NoLog
capture
isn't
handling lots of interrupts (cat /proc/interrupts
).
capture
.
curl http://arkime-wise.hostname:8081/views
on the capture
host that is dropping packets.
See settings
Think of the capture
binary much like you would tcpdump
.
The capture
binary can listen to live network interface(s), or read from historic packet capture files.
Currently Arkime works best with PCAP files, not PCAPng.
/opt/arkime/bin/capture -c [config_file] -r [PCAP file]
For an entire directory, use -R [PCAP directory]
See
/opt/arkime/bin/capture --help
for more info.
The --monitor
to monitor non NFS directories, --skip
to skip already loaded PCAP files, and -R
to process directories are common options. Multiple -r
and -R
options can be used.
If Arkime is failing to load a PCAP file check the following things:
--debug
which might warn of not
understanding the link type or GRE tunnel type. (Please open issues for
unknown link or GRE types)
By default importing offline pcap does NOT make a copy of the pcap file, Arkime saves a reference to the original file. If you want to make a copy of the pcap file, use the --copy
option with capture.
It is also possible to enable UI in Arkime to upload PCAP.
This is less efficient then just using capture
directly, since it uploads the file and then runs capture
for you.
Just uncomment the uploadCommand
in the config.ini file.
The easy way is using the interface setting in your config.ini. It supports a semicolon ';' separated list of interfaces to listen on for live traffic. If you want to set a tag or another field per interface, use the interfaceOps setting.
The hard way, you can also have multiple capture
processes,.
/opt/arkime/etc/config.ini
, and create a section for
each of the Arkime nodes. Assuming the defaults are correct in the
[default]
section, the only thing that
MUST be set is the interface item. It is also common
to have each Arkime node talk to a different OpenSearch/Elasticsearch node if
running a cluster of OpenSearch/Elasticsearch nodes.
The arkime-m01
is an EXAMPLE node name.
[arkime-m01a]
interface=eth2
[arkime-m01b]
interface=eth5
hostname
+ domainname
on the machine
doesn't return a FQDN, you'll also need to set a viewUrl, or easier
use the --host
option.
mv /etc/systemd/system/arkimecapture.service arkimecapture1.service
cp /etc/systemd/system/arkimecapture1.service /etc/systemd/system/arkimecapture2.service
ExecStart=/bin/sh -c '/opt/arkime/bin/capture -n arkime-m01a -c /opt/arkime/etc/config.ini
and ExecStart=/bin/sh -c '/opt/arkime/bin/capture -n arkime-m01b -c /opt/arkime/etc/config.ini
systemctl daemon-reload; systemctl start arkimecapture1; systemctl start arkimecapture2
You only need to run one viewer on the machine. Unless
it is started with the -n
option, it will still use the
hostname as the node name, so any special settings need to be set there
(although default is usually good enough).
Please file an issue on github with the stack trace.
sysctl
to change until the next reboot. Setting it to
0 will change it back to the default.
sysctl -w fs.suid_dumpable=2
capture
is running in.
capture
and get it to crash.
gdb
(you may need to install the gdb package first)
gdb /opt/arkime/bin/capture corefilename
bt
command
If it is easy to reproduce, sometimes it's easier to just run
gdb
as root:
gdb capture
as root.
gdb
with
run ALL_THE_ARGS_USED_FOR_ARKIME-CAPTURE_GO_HERE
.
bt
command.
g_log
b g_log
Usually capture
is started as root so that it can
open the interfaces and then it immediately drops privileges to
dropUser
and dropGroup
, which are by default
nobody:daemon
. This means that all parent directories need
to be either owned or at least executable by nobody:daemon
and that the pcapDir itself must be writeable.
Listed in order from highest to lowest benefit to Arkime
bpf=
filter will stop Arkime from seeing the
traffic.
packet-drop-ips
section will stop
Arkime from adding packets to the PacketQ
Arkime capture supports many options for controlling which packets are captured, processed, and saved to disk.
bpf=
in the config file. This filter can be implemented in
the kernel, the network card, libpcap or network drivers. It is a
single filter and it controls what Arkime capture "sees" or doesn't
"see". Any packet that is dropped because of the bpf filter is usually
not counted in ANY Arkime stats, but some implementation do expose
stats.
packet-drop-ips
config section to see if
the IPs involved are marked to be discarded. If there are only a few
IPs to drop then bpf=
should be used, otherwise this is
much more efficient then a huge bpf.
_dropByDst
or _dropBySrc
timeout,
if it matches they will be discarded.
packetThreads
or
maxPacketsInQueue
if too
many packets are being dropped here.
dontSaveBPFs
, and if one matches it will
save off the max number of packets to save for the session. This will
override the maxPackets
config setting.
minPacketsSaveBPFs
and save off a min number of packets that must be received.
PCAP deletion is handled by the viewer process, so verify the viewer process is running on all capture instances. The viewer process checks on startup and then every minute to see how much space is available, and if it is below freeSpaceG, then it will start deleting the oldest file. The viewer process logs every time a file is deleted, so it is possible to figure out when a file is deleted if needed. If the viewer complains about not finding the PCAP data, make sure you check the viewer.log.
Note: freeSpaceG can be a number freeSpaceG=1000
or
a percentage, with freeSpaceG=5%
the default.
The viewer process will always leave at least 10 PCAP files on the disk,
so make sure there is room for at least maxFileSizeG * 10
capture files on disk, or by default 120G.
If still having PCAP delete issues:
debug=2
in the [default]
section of your config.ini file. After restarting viewer, check the viewer.log for messages or use grep -i expire /opt/arkime/logs/viewer.log
to see the relevant messages
maxFileSizeG * 10
space available.
db.pl http://localhost:9200 sync-files
command
locked
set,
viewer won't deleted locked files
sestatus
) temporarily disable it (setenforce 0
) and see if that fixes the problem.
There are several common reasons dontSaveBPFs might not work for you.
--debug
to capture when starting and looking at the outputdontSaveBPFs=tcp port 443:10
use something like dontSaveBPFs=tcp port 443 or (vlan and tcp port 443):10
.
Basically FILTER or (vlan and FILTER)
. Information from here.
tcpdump -i INTERFACE tcp port 443
for example.If still having issues, you might just try out a Arkime Rules file. Arkime converts dontSaveBPFs into a rule for you behind the scenes, so Arkime Rules are actually more powerful.
Arkime optimizes disk writes for efficiency, making it highly suitable for high bandwidth networks.
However, this approach might not be ideal for low bandwidth environments.
The amount of data Arkime buffers before writing to disk is determined by the pcapWriteSize
setting, which has a default value of 262144 bytes.
It's crucial to remember that this buffering occurs on a per-thread basis.
Therefore, for low bandwidth networks, it's advisable to set packetThreads
to 1 (a single thread).
The system is designed to flush the buffered PCAP to disk after 10 seconds of inactivity, but direct-io requires pagesize bytes to still be buffered, typically around 4096 bytes.
Encountering an error message such as ERROR - processSessionIdDisk - SESSIONID in file FILENAME couldn't read packet at FILEPOS packet # 0 of 2
or Not enough data 0 for header 16
usually indicates that the PCAP data is either still in the process of buffering and requires more time to be fully written to disk, or it suggests that a capture process or system crash occurred before the PCAP data could be saved.
It may be useful to turn OFF compression simpleCompression=none
on low bandwidth networks since compression causes more data to be buffered.
Note, running out of disk space can lead to the creation of numerous zero-byte PCAP files. For details on managing disk space and preventing such issues, refer to PCAP Deletion.
In small environments with low amounts of traffic this is possible. With Openvswitch you can create mirror port from a physical or virtual adapter and send the data to another virtual NIC as the listening interface. In KVM, one issue is that it isn't possible to increase the buffer size past 256 on the adapter using the Virtio network adapter (mentioned in another part of the FAQ). Without Arkime capture will continuously crash. To solve this in KVM, use the E1000 adapter, and configure the buffer size accordingly. Set up the SPAN port on Openvswitch to send traffic to it: https://www.rivy.org/2013/03/configure-a-mirror-port-on-open-vswitch/.
MaxMind recently changed how you download their free database files. You now need to signup for an account and setup the geoipupdate program. If using a version of Moloch before 2.2, you will need to edit your config.ini file and update the geolite paths.
Instructions:
yum install geoipupdate
or apt install geoipupdate
geoipupdate
as root and see if it works/usr/share/GeoIP/GeoLite2-Country.mmdb
and geoLite2ASN is now /usr/share/GeoIP/GeoLite2-ASN.mmdb
Arkime logs a lot of information for debugging purposes.
Much of this information is for bug reports, but can also be used to figure out what is going on.
You may need to use --debug
to enable these msgs.
Jan 01 01:01:01 http.c:369 moloch_http_curlm_check_multi_info(): 8000/30 ASYNC 200 http://eshost:9200/_bulk 250342/5439 14ms 12345ms
Jan 01 01:01:01 | Date |
http.c:369 | File Name:Line Number |
moloch_http_curlm_check_multi_info | Function Name |
8000/30 | 8000 queued requests to server 30 connections to server |
ASYNC | Asynchronous request, SYNC for Synchronous request |
200 | HTTP status code |
http://eshost:9200/_bulk | Requested URL |
250032/5439 | 250342 bytes uploaded (CURLINFO_SIZE_UPLOAD) 5439 bytes downloaded (CURLINFO_SIZE_DOWNLOAD) |
14ms | 14ms to connect to server (CURLINFO_CONNECT_TIME) |
12345ms | 12345ms total request time (CURLINFO_TOTAL_TIME) |
Jan 01 01:01:01 packet.c:1185 moloch_packet_log(): packets: 3911000000 current sessions: 41771/45251 oldest: 0 - recv: 4028852297 drop: 123 (0.00) queue: 1 disk: 2 packet: 3 close: 4 ns: 5 frags: 0/1988 pstats: 4132185901/1/2/3/4/5/6
Jan 01 01:01:01 | Date |
packet.c:1185 | File Name:Line Number |
moloch_packet_log | Function Name |
packets: 3911000000 | 3911000000 packets are going to be processed by the packet queues. These packets have made it past corrupt checks, packet-drop-ips checks, and are ones we most likely understand. |
current session: 41771/45251 | 41771 monitored sessions of the current session type (usually tcp) 45251 monitored sessions total |
oldest: 0 | In the current session type queue, the oldest session should be idled out in 0 seconds |
recv: 4028852297 | 4028852297 packets have been received by the interface since capture start, as reported by the reader's stats api |
drop: 123 | 123 packets have been dropped by the interface since capture started, as reported by the reader's stats api |
(0.00) | 0.00% packets have been dropped by the interface since capture started, as reported by the reader's stats api |
queue: 1 | 1 bulk request is waiting to be sent to the OpenSearch/Elasticsearch servers, each bulk request may hold multiple sessions |
disk: 2 | 2 disk buffers writes are outstanding, each buffer will hold multiple packets |
packet: 3 | 3 packets are waiting to be processed in all the packet queues |
close: 4 | 4 tcp sessions have been marked for closing (RST/FIN), waiting on last few packets |
ns: 5 | 5 sessions are ready to be saved but there is a plugin that is doing async work, such as WISE |
frags: 0/1988 | always 0 1988 current ip frags waiting to be matched |
pstats: 4132185901/1/2/3/4/5/6 | 4132185901 packets successfully sent to a packet queue 1 packet dropped because of packet-drop-ips config 2 packets dropped because the packet queues were overloaded 3 packets dropped because they were corrupt 4 packets dropped because how to process was unknown to us 5 packets dropped because of ipport rules 6 packets dropped because of packet deduping (2.7.1 enablePacketDedup) |
Click on the owl and read the Search Bar section. The Fields section is also useful for discovering fields you can use in a search expression.
The most common cause of this problem is that the timestamps between the Arkime machines are different. Make sure ntp is running everywhere, or that the time stamps are in sync.
Recent versions of Chrome, Firefox, and Safari should all work fairly equally. Below are the minimum versions required. We aren't kidding.
Arkime Version | Chrome | Firefox | Opera | Safari | Edge | IE |
---|---|---|---|---|---|---|
Prior to 3.0 | 53 | 54 | 40 | 10 | 14 | Not Supported |
3.x, 4.x | 80 | 74 | 67 | 13.1 | 80 | Not Supported |
5.x and beyond | 92 | 95 | 78 | 15.4 | 92 | Not Supported |
Development and testing is done mostly with Chrome on a Mac, so it gets the most attention.
This seems to be caused when proxying requests from one viewer node to
another and the machines don't use FQDNs for their hostnames and the
short hostnames are not resolvable by DNS. You can check if your
machine uses FQDNs by running the hostname
command. There
are several options to resolve the error:
--host
option on capture
config.ini
and add a viewUrl
for each
node. This part of the config file must be the same on all machines
(we recommend you just use the same config file everywhere). Example:
[node1_eth0]
interface=eth0
viewUrl=http://node1.fqdn
[node1_eth1]
interface=eth1
viewUrl=http://node1.fqdn
[node2]
interface=eth1
viewUrl=http://node2.fqdn
Apache, and other web servers, can be used to provide authentication or other services for Arkime when setup as a reverse proxy. When a reverse proxy is used for authentication it must be inline, and authentication in Arkime will not be used, however Arkime will still do the authorization. Arkime will use a username that the reverse proxy passes to Arkime as a HTTP header for settings and authorization. See the architecture page for diagrams. While operators will use the proxy to reach the Arkime viewer, the viewer processes still need direct access to each other.
setsebool -P httpd_can_network_connect 1
is required.
ARKIME_USER
is the header that is being
set from a variable, if your auth method already sets a header use
that.
AuthType your_auth_method
Require valid-user
RequestHeader set ARKIME_USER %{your_auth_method_concept_of_username_variable_probably_REMOTE_USER}e
SSLProxyEngine On
#ProxyRequests On # You probably don't want this line
ProxyPass /arkime/ https://localhost:8005/ retry=0
ProxyPassReverse /arkime/ https://localhost:8005/
config.ini
userNameHeader
to the lower case version of the header Apache is setting. NOTE - the
userNameHeader setting is only needed on viewers that apache talks to, don't set on all of them.
webBasePath
to the ProxyPath location used above.
All other sections should NOT have a webBasePath
.
viewHost=localhost
, so externals can't just set the userNameHeader
and access Arkime with no auth:
[arkime-proxy]
userNameHeader=arkime_user
webBasePath = /arkime/
viewPort = 8005
viewHost = localhost
arkime-proxy
viewer, so for this example you would need to add -n arkime-proxy
to your systemd file (/etc/systemd/system/molochviewer.service by default) on the ExecStart line after viewer.js so viewer uses that section
addUser.js
script.
userNameHeader
is the
lower case version of the header Apache is using.
viewer.js
with a --debug
and see if the
header is being sent.
It is possible to search multiple Arkime clusters by setting up a special Arkime MultiViewer and a special MultiES process. The MultiES process is similar to Elasticsearch tribe nodes, except it was created before tribe nodes and can deal with multiple indices having the same name. The MultiViewer talks to MultiES instead of a real OpenSearch/Elasticsearch instance. Currently one big limitation is that all Arkime clusters must use the same serverSecret.
To use MultiES, create another config.ini
file or section in a shared config file.
Both multies.js
and the special "all" viewer can use the same node name.
See Multi Viewer Settings for more information.
# viewer/multies node name (-n allnode)
[allnode]
# The host and port multies is running on, set with multiESHost:multiESPort usually just run on the same host
elasticsearch=127.0.0.1:8200
# This is a special multiple arkime cluster viewer
multiES=true
# Port the multies.js program is listening on, elasticsearch= must match
multiESPort = 8200
# Host the multies.js program is listening on, elasticsearch= must match
multiESHost = localhost
# Semicolon list of OpenSearch/Elasticsearch instances, one per arkime cluster. The first one listed will be used for settings
# You MUST have a name set
multiESNodes = http://escluster1.example.com:9200,name:escluster1,prefix:PREFIX;http://escluster2.example.com:9200,name:escluster2
# Uncomment if not using different rotateIndex settings
#queryAllIndices=false
Now you need to start up both the multies.js
program and
viewer.js
with the same config file AND -n allnode
.
All other viewer settings, including webBasePath
can still be used.
By default, the users table comes from the first cluster listed in multiESNodes
.
This can be overridden by setting usersElasticsearch
and
optionally usersPrefix
in the multi viewer config file.
Since 4.2.0 MultiES supports the caTrustFile
setting.
Priority to 4.2.0 you will need to create a file, for example CAcerts.pem, containing one or more trusted certificates in PEM format.
Then, you need start MutilES adding NODE_EXTRA_CA_CERTS environment variable specifying the path to file you just created, for example:
NODE_EXTRA_CA_CERTS=./CAcerts.pem /opt/arkime/bin/node multies.js -c /opt/arkime/etc/config.ini -n allnode
An admin can change anyone's password on the Users tab by clicking the Settings link in the Actions column next to the user.
A password can also be changed by using the addUser
script, which will replace the entire account if the same userid is
used. All preferences and views will be cleared, so creating a
secondary admin account may be a better option if you need to change
an admin users password. After creating a secondary admin account,
change the users password and then delete the secondary admin account.
node addUser -c <configfilepath> <user id> <user friendly name> <password> [--admin]
Viewers have the ability to proxy traffic for each other. The ability relies on Arkime node names that are mapped to hostnames. Common problems are when systems don't use FQDNs or certs don't match.
First the SPI records are created on the capture
side.
capture
gets a nodename, either by the
-n
command line option or everything in front of the
first period of the hostname.
capture
writes a stats record every few
seconds that has the mapping from the nodename to the FDQN.
It is possible to override the FDQN with the --host
option to capture.
When PCAP is retrieved from a viewer it uses the nodename associated with the SPI record to find which capture host to connect to.
arkime-viewer
process gets a nodename, either by
the -n
command line option or everything in front of the
first period of the hostname.
arkime-viewer
nodename it can be processed locally, STOP HERE. This is the common
case with one arkime node.
stats[nodename].hostname
is the same as the
arkime-viewer
's hostname (exact match) then it can be
processed locally, STOP HERE.
Remember this is written by capture above, either the FQDN or --host
.
This is the common case with multiple capture processes per capture node.
viewUrl
set in the [nodename]
section, use that.
[default]
section, use that.
stats[nodename].hostname:[nodename section - viewPort setting]
stats[nodename].hostname:[default section - viewPort setting]
stats[nodename].hostname:8005
First, look at viewer.log
on both the viewer machine and
the remote machine and see if there are any obvious errors. The most
common problems are:
config.ini
on all nodes can make
things a pain to debug and sometimes not even work. It is best to
use the same config with different sections for each node name
[nodename]
hostname
command AND the viewer machine can't resolve
just the hostname. To fix this, do ONE of the following:
--host
option to capture
and restart capture
hostname "fullname"
as root and edit
/etc/sysconfig/network
)
viewUrl
in each node section of the
config.ini
. If you don't have a node section for
each host, you'll need to create one.
/etc/resolv.conf
and add
search foo.example.com
, where
foo.example.com
is the subdomain of the hosts.
Basically, you want it so "telnet shortname 8005" works on the
viewer machine to the remote machine.
dropUser
user or dropGroup
group can read the PCAP files.
Check the directories in the path too.
viewUrl
for each node.
-n
option
viewUrl
for that old node name that points to the new host.
Arkime uses Node.js for the viewer component, and requires many
packages to work fully. These packages must be compiled with and run
using the same version of Node.js. An error like … was compiled
against a different Node.js version using NODE_MODULE_VERSION 48. This
version of Node.js requires NODE_MODULE_VERSION 57.
means that
the version of Node.js used to install the packages and run the
packages are different.
This shouldn't happen when using the prebuilt Arkime releases. If it
does, then double check that /opt/arkime/bin/node
is
being used to run viewer.
If you built Arkime yourself, this usually happens if you have a different version of node in your path. You will need to rebuild Arkime and either:
/opt/arkime/bin
is in your path before the OS
version of node
--install
option to easybutton which will add to
the path for you
By default viewer listens on port 8005. Changing this can be tricky, especially for a port less than 1024, like 443. You should definitely read the How do viewers find each other section.
Scenario | Solutions |
---|---|
Change all nodes to port > 1024 | Set viewPort in [default] section on ALL nodes |
Change single node port < 1024, remaining nodes (if any) unchanged |
Usually unless a program runs as root it can NOT listen to ports less than 1024.
Since viewer by default drops privileges before listening, even if you start as root, it isn't root anymore when trying to listen on the port.
Possible solutions are:
|
All nodes, port < 1024 | Just don't. :) If you must, most of the solutions above will work, but don't do the reverse proxy solution since viewer nodes need to talk to each other WITHOUT external authentication. |
Hunts require a single viewer node be specified to actually coordinate the hunts.
Since 4.3.1 the preferred method is to set cronQueries=auto
in the [default]
section of the config.ini file for nodes that should be eligible to run hunts.
This will automatically select a single node to run hunts, and if that node goes down, another node will be selected.
If using central viewers, it is recommended to only set cronQueries=auto
on the central viewers.
You can also force hunts to run on a specific node by setting cronQueries=true
in the [default]
section of the config.ini file.
You must only set cronQueries=true
on one and only one node.
If cronQueries is properly set up on a single node, and hunts still aren't working, make sure the cronQueries node is running and checking in. You can check this on the Stats -> ES Nodes tab and/or check the viewer logs.
See more information about the cronQueries setting here.
Here is the common check list:
curl http://localhost:3218/settings#integrations
/opt/arkime/logs/cont3xt.log
file.
cont3xt
after adding a
debug=2
in the [cont3xt]
section may print out useful information what is wrong.
Parliament is designed to run behind a reverse proxy such as Apache.
Basically, you just need to tell Apache to send all root requests and any
/parliament
requests to the Parliament server.
ProxyPassMatch ^/$ http://localhost:8008/parliament retry=0
ProxyPass /parliament/ http://localhost:8008/parliament/ retry=0
Here is the common check list:
curl http://localhost:8081/fields
You should see a list of fields that WISE knows about.
wise.so
to the plugins=
line.wise.js
to the viewerPlugins=
line.wiseURL
has been set, or the older wiseHost
and wisePort
curl http://WISEHOST:8081/fields
capture
after adding a
--debug
option may print out useful information what is
wrong. Look to make sure that WISE is being called with the correct URL.
Verify that the plugins, wiseHost and wiseURL setting is what you actually think it is.
Want to add or edit this FAQ? Found an issue on this site? This site's code is open source. Please contribute!