If you want a standalone open-source full packet capture (FPC) system with metadata parsing and searching, then Arkime is the solution! Full packet capture systems allow network and security analysts to see exactly what happen from a network point of view. Since Arkime is open-source, you have complete control of the deployment and architecture. There are other FPC systems available.
(/ɑːrkɪˈmi/)? Read more about why we changed our name here.
Upgrading Arkime requires you to install major versions in order, as described in the chart below. If your current version isn’t listed, please upgrade to the next-highest version in the chart. You can then install the major releases in order to catch up. New installs can start from the latest version. Unless otherwise stated, you should only need to db.pl upgrade between versions.
Name | Version |
Min Version to Upgrade From |
OpenSearch Versions |
Elasticsearch Versions |
Special Instructions | Notes |
---|---|---|---|---|---|---|
Arkime | 4.0+ | 3.3.0+ (3.4.0 recommended) | 1.0.0+ (2.3 recommended) | 7.10+ | Arkime 4.x instructions | |
Arkime | 3.0+ | 2.4.0 | 1.0.0+ (2.3 recommended) | 7.10+, not 8.x | Arkime 3.x instructions | |
Arkime | 2.7+ | 2.0.0 | N/A | 7.4+ (7.9.0+ recommended, 7.7.0 broken) | Elasticsearch 7 instructions | |
Moloch | 2.2+ | 1.7.0 (1.8.0 recommended) | N/A | 6.8.2+ (6.8.6+ recommended), 7.1+ (7.8.0+ recommended, 7.7.0 broken) | Moloch 2.x instructions | Must already be on 6.8.x or 7.1+ before upgrading to 2.2 |
Moloch | 2.0, 2.1 | 1.7.0 (1.8.0 recommended) | N/A | 6.7, 6.8, 7.1+ | Moloch 2.x instructions | Must already be on Elasticsearch 6.7 or 6.8 (Elasticsearch 6.8.6 recommended) before upgrading to 2.0 |
Moloch | 1.8 | 1.0.0 (1.1.x recommended) | N/A | 5.x or 6.x | Elasticsearch 6 instructions | Must have finished the 1.x reindexing; stop captures for best results |
Moloch | 1.1.1 | 0.20.2 (0.50.1 recommended) | N/A | 5.x or 6.x (new only) | Instructions | Must be on Elasticsearch 5 already |
Moloch | 0.20.2 | 0.18.1 (0.20.2 recommended) | N/A | 2.4, 5.x | Elasticsearch 5 instructions |
We have RPMs/DEBs/ZSTs available on the downloads page. Our deployment is on RHEL 7 and RHEL 8, using both the pcap and afpacket reader, depending on the deployment. We recommend using afpacket (tpacketv3) whenever possible. A large amount of development is done on macOS 12.5 using MacPorts or Homebrew; however, it has never been tested in a production setting. :) Arkime is no longer supported on 32-bit machines. Currently we do not support Ubuntu releases that aren't LTS and there may be library issues.
The following operating systems should work out of the box:
Here is the common checklist to perform when diagnosing a problem with Arkime (replace /opt/arkime with /data/moloch for Moloch builds):
curl http://localhost:9200/_cat/health
on the machine running OpenSearch/Elasticsearch.
An Unauthorized response probably means that you need user:pass in all OpenSearch/Elasticsearch URLs or that you are using the wrong URL.
/opt/arkime/db/db.pl http://elasticsearch.hostname:9200 info
command. You should see information about the database version and number of sessions.
http://arkime-viewer.hostname:8005
from your browser.
viewHost=localhost
is NOT set in the config.ini file. Test that curl http://IP:8005
works from the host viewer is running on.
/opt/arkime/logs/viewer.log
and that viewer is running with the pgrep -lf viewer
command. If the UI looks strange or isn't working, viewer.log
will usually have information about what is wrong.
/opt/arkime/logs/capture.log
and that capture is running with the pgrep -lf capture
command. If packets aren't being processed or other metadata generation issues, capture.log
will usually have information about what is wrong and links to the FAQ on how to fix.
http://arkime-viewer.hostname:8005/stats?statsTab=1
in your browser.
capture.log
capture.log
capture.log
bpf=
in /opt/arkime/etc/config.ini
. If that fixes the issue, read the
BPF FAQ answer.
/opt/arkime/etc/config.ini
to shorten the timeouts.
curl http://elasticsearch.hostname:9200/_refresh
command.
capture
after adding a
--debug
option may print out useful information about what is
wrong.
You can add multiple --debug
options to get even more information.
Capture will print out the configuration settings it is using; verify that they
are what you expect.
Usually this setting is changed in /etc/systemd/system/molochcapture.service
. Then run systemctl daemon-reload
.
--debug
option may print out
useful information about what is wrong.
Usually this setting is changed in /etc/systemd/system/molochviewer.service
. Then run systemctl daemon-reload
.
/opt/arkime/etc/config.ini
and readable by the
viewer process.
grep moloch_packet_log /opt/arkime/logs/capture.log | tail
Verify that the packets number is greater than 0. If not, then no packets were processed.
Verify that the first pstats number is greater than 0. If not, Arkime didn't know how to decode any packets.
db.pl
script with either the init
or
wipe
commands. The only difference between the two
commands is that wipe
leaves the
added users so that they don’t need to be re-added.
/opt/arkime/db/db.pl http://ESHOST:9200 wipe
/bin/rm -f /opt/arkime/raw/*
The core Arkime team does not support or recommend self-signed certificates, although it is possible to make them work. We suggest using your cost savings from using a commercial full capture product to purchase certificates. Wildcard certificates are now inexpensive, and you can even choose free Lets Encrypt certificates. Members of the Arkime Slack workspace may be willing to help out, but the core developers may just link to this answer. Private CA certificates will have the same issues and solutions as self-signed certificates.
Potentially the easiest solution is to add the self-signed certificate to the operating system's list of valid certificates or chains.
Googling is the best way to figure out how to do this—it is different for almost every OS release and version.
You may need to add the certificate to several lists because node (viewer), curl (capture), and perl (db.pl) sometimes use different locations for their list of trusted certificates.
Viewer supports a caTrustFile
option that was contributed to the project.
Since 4.2.0 all pieces of Arkime should support the caTrustFile
setting.
Another option is to just turn off certificate checking.
Capture, viewer, arkime_add_user.sh, and db.pl can run with --insecure
to turn off certificate checking.
You will need to add this option to the startup command for both capture and viewer.
For example, in the /etc/systemd/system/arkimecapture.conf
file, change the ExecStart
line from ... capture -c ...
to ... capture --insecure -c ...
.
You would need to do the same thing for any viewer systemd files.
Moloch 1.x has some large changes and updates that will require all session data to be reindexed. The reindexing is done in the background AFTER upgrading, so there is little downtime. Large changes in 1.0 include the following:
ip
type.
If you have any special parsers, taggers, plugins, or WISE sources, you may need to change configurations.
To upgrade:
/data/moloch/bin/moloch_update_geo.sh
on all capture
nodes that will download the new mmdb style maxmind files.
db.pl http://ESHOST:9200 upgrade
once.
Once 1.1.1 is working, you need to reindex the old session data:
/data/moloch/viewer
directory, run
/data/moloch/viewer/reindex2.js --slices X
.
curl -XDELETE 'http://localhost:9200/sessions-*'
Upgrading to Moloch 2.x is a multistep process that requires an outage. An outage is required because all the captures must be stopped before upgrading the database so that there are no schema issues or corruption. Most of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with 6.7 or 6.8. This is very important when upgrading to Elasticsearch 7.x or later.
./db.pl http://ESHOST:9200 backup pre20
to back up all administrative indices.
./db.pl http://ESHOST:9200 upgrade
.Upgrading to Arkime 3.x is a multistep process that requires an outage. An outage is required because all the captures MUST be stopped before upgrading the database so that there are no schema issues or corruption. Do not restart the capture processes until the db.pl upgrade has finished! All of the administrative indices will have new version numbers after this upgrade so that Elasticsearch knows they were created with version 7. This is very important when upgrading to Elasticsearch 8.x or later.
arkime_
yet.db.pl ilm
command again after upgrading. ./db.pl http://ESHOST:9200 backup pre30
to back up all administrative indices. ./db.pl http://ESHOST:9200 upgrade [other options]
, and don't forget to include any other options you usually use, like --replicas or --ilm.db.pl ilm
command again with all the same options that were used previously.Upgrading to Arkime 4.x requires that you are already using Arkime 3.3.0 or later. Arkime 4.x uses a new permissions model with roles.
--roles
option or the --admin
sets the superAdmin
role
./db.pl http://ESHOST:9200 upgrade [other options]
, and don't forget to include any other options you usually use, like --replicas or --ilm.Arkime supports both OpenSearch and Elasticsearch, and our goal is to continue to support both. Some older documentation and settings may only refer to Elasticsearch, but OpenSearch should work for Arkime versions supporting Elasticsearch 7+. As OpenSearch and Elasticsearch diverge, we may add features that are only enabled based on which is being used. Arkime will never require any Elasticsearch pay features but may optionally support them.
The answer, of course, is "it depends." Factors include:
The following are some important things to remember when designing your cluster:
We have some estimators that may help.
The good news is that it is easy to add new nodes in the future, so feel free to start with fewer nodes.
As a temporary fix for capacity problems, you can reduce the number of days of metadata that are stored.
You can use the Arkime ES Indices tab to delete the oldest sessions2
or sessions3
index.
The SPI data in OpenSearch/Elasticsearch and the PCAP data are not deleted at the same time.
The PCAP data is deleted as the disk fills up on the capture machines. See here for more information.
PCAP deletion happens automatically, and nothing needs to be done.
The SPI data is either deleted by using ILM or when the ./db.pl expire
command is run, usually from cron during off peak.
Unless you use ILM, the SPI data deletion does NOT happen automatically, and a cron job MUST be set up.
A cron setup that only keeps 90 days of data and expires at midnight might look like this:
0 0 * * * /opt/arkime/db/db.pl http://localhost:9200 expire daily 90
So deleting a PCAP file will NOT delete the SPI data, and deleting the SPI data will not delete the PCAP data from disk.
The UI does have commands to delete and scrub individual sessions, but
the user must have the Remove Data
ability on the users tab.
This feature is used for things you don’t want operators to see, such as bad images,
and not as a general solution for freeing disk space.
This error means that your OpenSearch/Elasticsearch cluster can not keep up with the number of sessions that the capture nodes are trying to send it or there are too many messages being sent. You may only see the error message on your busiest capture nodes because capture tries to buffer the requests.
Check the following:
--replicas 1
with your daily
./db.pl expire
run after turning off replication in the sessions template using
./db.pl upgrade
without the
--replicas
option.
./db.pl upgrade
again.
If these don’t help, you need to add more nodes or reduce the number of sessions being monitored. You can reduce the number of sessions with packet-drop-ips, bpf filters, or rules files, for example.
If queries are too slow, the easiest fix is to add additional OpenSearch/Elasticsearch nodes.
OpenSearch/Elasticsearch doesn’t perform well if Java hits an OutOfMemory condition.
If you ever have one, you should immediately delete the oldest *sessions*
index,
update the daily expire cron to delete more often, and restart the OpenSearch/Elasticsearch cluster.
Then you should order more machines. :)
curl -XPUT 'localhost:9200/sessions*/_settings' -d '{
"index.routing.allocation.total_shards_per_node": 2
}'
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{
"indices.recovery.concurrent_streams":6,
"indices.recovery.max_bytes_per_sec":"50mb"}
}'
Turning on replication will consume twice the disk space on the nodes and increase the network bandwidth between nodes, so make sure you actually need replication.
To change future days, run the following command:
db/db.pl <http://ESHOST:9200> upgrade --replicas 1
To change past days but not the current day, run the following command:
db/db.pl <http://ESHOST:9200> expire <type> <num> --replicas 1
We recommend the second solution because it allows current traffic to be written to OpenSearch/Elasticsearch once, and during off peak the previous day's traffic will be replicated.
In general, if upgrading between minor or build versions of Elasticsearch, you can perform a rolling upgrade with no issues. Follow Elastic's instructions for the best results. Make sure you select the matching version of that document for your version of Elasticsearch from the dropdown menu on the right side of the screen.
Upgrading between major versions of Elasticsearch usually requires an upgrade of Arkime. See the following instructions:
Elasticsearch 8.x is NOT supported before 3.4.1, and we recommend that you use Arkime 4.x.
db.pl http://ESHOST:9200 upgrade
while still using Elasticsearch 7.
db.pl http://ESHOST:9200 info
or mouse over the in Arkime.
indices.breaker.total.limit
, you should unset it.curl http://localhost:9200/_cat/health
Elasticsearch 6.x is supported by Moloch 1.x for NEW clusters and >= 1.5 for UPGRADING clusters.
NOTE – If upgrading, you must FIRST upgrade to Moloch 1.0 or 1.1 (1.1.1 is recommended) before upgrading to > 1.5. Also, all reindex operations need to be finished.
We do NOT provide Elasticsearch 6 startup scripts or configuration, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.
Upgrading to Elasticsearch 6 will REQUIRE two downtimes.
First outage: If you are NOT using Moloch DB version 51 (or later), you
must follow these steps while still using Elasticsearch 5.x. To find what DB version
you are using, either run
db.pl http://ESHOST:9200 info
or mouse over the in Moloch.
./db.pl http://ESHOST:9200 upgrade
.
Second outage: Upgrade to Elasticsearch 6.
/data/foo/<clustername>
.
curl http://localhost:9200/_cat/health
Elasticsearch 5.x is supported by Moloch 0.17.1 for NEW clusters and 0.18.1 for UPGRADING clusters.
Elasticsearch 5.0.x, 5.1.x, and 5.3.0 are NOT supported because of Elasticsearch bugs/issues. We currently use 5.6.7.
WARNING – If you have sessions-*
indices
created with Elasticsearch 1.x, you can NOT upgrade. Those indices will need to be
deleted.
We do NOT provide Elasticsearch 5 startup scripts, so if upgrading, make sure you get startup scripts working on test machines before shutting down your current cluster.
Upgrading to Elasticsearch 5 may REQUIRE 2 downtime periods of about 5–15 minutes each.
First outage: If you are NOT using Moloch DB version 34 (or later), you
must follow these steps while still using Elasticsearch 2.4. To find what DB version
you are using, either run db.pl http://ESHOST:9200 info
or mouse over the in Moloch.
curl http://localhost:9200/_cat/health
./db.pl http://ESHOST:9200 upgrade
.
Second outage: Upgrade to Elasticsearch 5.
curl http://localhost:9200/_cat/health
This error usually happens when the capture process is trying to update the stats data and falls behind.
Arkime will continue to function while this error occurs with the stats or dstats index; however, it does usually mean that your Elasticsearch cluster is overloaded.
You should consider increasing your Elasticsearch capacity by adding more nodes, CPU, and/or more memory.
If increasing Elasticsearch capacity isn't an option, then reduce the amount of traffic that Arkime processes.
If the N vs. M version numbers are very different from each other, it usually means that you are running two nodes with the same node name at the same time, which is not supported.
Here are some of our recommended OpenSearch/Elasticsearch settings. Many of these
can be updated on the fly, but it is still best to put them in your
elasticsearch.yml
file. We strongly recommend using the same
elasticsearch.yml
file on all hosts. Things that need to be
different per host can be set with variables.
You will probably want to change the watermark settings so you can use
more of your disk space. You have the option to use ALL percentages or
ALL values, but you can’t mix them. The most common sign of a problem
with these settings is an error that has
FORBIDDEN/12/index read-only / allow delete
in it. You can
use
./db.pl http://ESHOST:9200 unflood-stage _all
to clear the error once you adjust the settings and/or delete some
data.
Elasticsearch Docs
cluster.routing.allocation.disk.watermark.low: 97%
cluster.routing.allocation.disk.watermark.high: 98%
cluster.routing.allocation.disk.watermark.flood_stage: 99%
Or, if you want more control, use values instead of percentages:
cluster.routing.allocation.disk.watermark.low: 300gb
cluster.routing.allocation.disk.watermark.high: 200gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb
If you have a lot of shards that you want to be able to search against at once Elasticsearch Docs
action.search.shard_count.limit: 100000
No longer need to set since Elasticsearch 7.9.
If you hit a lot of bulk failures, this can help, but Elastic doesn’t
recommend raising too much. In older versions of Elasticsearch, it is
named thread_pool.bulk.queue_size
, so check the docs for
your version.
Elasticsearch Docs
thread_pool.write.queue_size: 10000
On by default in most versions, allows for HTTP compression. Elasticsearch Docs
http.compression: true
To speed up recovery times and startup times, there are a few controls to experiment with. Make sure you test them in your environment and slowly increase them because they can break things badly. Elasticsearch Allocation Docs and Elasticsearch Recovery Docs
cluster.routing.allocation.cluster_concurrent_rebalance: 10
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.node_initial_primaries_recoveries: 5
indices.recovery.max_bytes_per_sec: "400mb"
By default, Elasticsearch has logging set to debug level in log4j2.properties. For busy clusters, change this to info level to lower CPU and disk usage.
logger.action.level = info
Since Moloch 2.2, you can easily use ILM to move indices from hot to warm, force merge, and delete.
We recommend only using ILM with newer versions (7.2+) of Elasticsearch because older versions had some issues.
Once ILM is enabled, you no longer have to use the db.pl expire
cron job but should occasionally run db.pl optimize-admin
.
ILM is only included in the free "basic" Elasticsearch license, so it is not part of the Elasticsearch OSS distribution, and you may need to upgrade. Arkime does NOT currently support the ILM auto rollover feature, for performance reasons, when searching.
These instructions assume you are using db.pl or Arkime UI to set up ILM and will use a special molochtype
attribute name.
You can also do this with Kibana to create the ILM config and not use the molochtype
attribute name, but you will then need to do everything on your own.
In order for ILM work correctly with Arkime, follow these five important steps:
node.attr.molochtype: warm
or node.attr.molochtype: hot
db.pl ilm
command.db.pl ilm
can perform this action.db.pl upgrade ... --ilm
and add --ilm
to the command. Also add --hotwarm
if using a hot/warm design.db.pl expire
cron job with db.pl optimize-admin
So for example, to create a new policy that keeps 30 weeks of history, 90 days of SPI data, 1 replica, and optimizes all indices older than 25 hours, you would run:
./db.pl http://localhost:9200 ilm 25h 90d --history 30 --replicas 1
You would then need to run upgrade with all the arguments you usually use, plus --ilm:
./db.pl http://localhost:9200 upgrade --replicas 1 --shards 5 --ilm
The goal of Arkime is to use commodity hardware. If you start thinking about using SSDs or expensive NICs, research whether it would just be cheaper to buy one more box. This gains more retention and can bring down the cost of each machine.
Some things to remember when selecting a machine:
capture
links because 8 Gbps is close to the max.
capture
process.
When selecting Arkime capture boxes, standard "Big Data" boxes might be the best bet ($10k–$25k each). Look for:
We are big fans of using network packet brokers (NPBs) ($6k+). They allow
multiple taps/mirrors to be aggregated and load balanced across multiple
capture
machines. Read more in the following sections.
We are big fans of using NPBs, and we recommend that medium or large Arkime deployments use an NPB. See MolochON 2017 NPB Preso .
Main advantages:
Features to look for:
Just like with Arkime with commodity hardware, you don’t necessarily have to pay a lot of money for a good NPB. Some switch vendors offer switches that can operate in switch mode or NPB mode, so you might already have gear you can use.
Sample vendors
arkime-capture
handle?
On basic commodity hardware, it is easy to get 3 Gbps or more, depending on the number of CPUs available to Arkime and what else the machine is doing. Many times, the limiting factor is the speed of the disks and RAID system. See Architecture and Multiple Host for more information. Arkime allows multiple threads to be used to process the packets.
To test the local RAID device, use:
dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
To test a NAS, leave off the oflag=direct and make sure you test with at least 3x the amount of memory so that cache isn't a factor:
dd bs=256k count=150000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test
This is the MAX disk performance. Run several times if desired and take the average. If you don’t want to drop any packets, you shouldn’t average more then ~80% of the MAX disk performance. If you are using RAID and don’t want to drop packets during a future rebuild, ~60% is a better value. Remember that most network numbers will be in bits, while the disk performance will be in bytes, so you’ll need to adjust the values before comparing.
When you get an error about the capture length not matching the packet length, it is NOT an issue with Arkime. The issue is with the network card settings.
By default modern network cards offload work that the CPUs would need to do. They will defragment packets or reassemble tcp sessions and pass the results to the host. However this is NOT what we want for packet captures, we want what is actually on the network. So you will need to configure the network card to turn off all the features that hide the real packets from Arkime.
The sample config files
(/opt/arkime/bin/arkime_config_interfaces.sh
) turn off many
common features but there are still some possible problems:
ethtool -k INTERFACE | grep on
— Anything that is
still on, turn off and see if that fixes the problem.
Items that says [fixed]
can NOT be disabled with ethtool.
ethtool -K INTERFACE tx off sg off gro off gso off lro off tso off
There are two work arounds:
readTruncatedPackets=true
in the config file, this is
the only solution for saved .pcap files
snapLen=65536
in the config file, this is not recommended
There are several different types of packet drops and reasons for packet drops:
Please make sure you are using a recent version of Arkime. Constant improvements are made and it is hard for us to support older versions.
The most common cause of packet drops with Arkime is leaving the reader default of libpcap instead of switching to tpacketv3, pfring or one of the other high performance packet readers. We strongly recommend tpacketv3. See plugin settings for more information.
Make sure the network card is configured correctly by increasing the ring buf to max size and turning off most of the card’s features. The features are not useful anyway, since we want to capture what is on the network instead of what the local OS sees. Example configuration:
# Set ring buf size, see max with ethool -g eth0
ethtool -G eth0 rx 4096 tx 4096
# Turn off feature, see available features with ethtool -k eth0
ethtool -K eth0 rx off tx off gs off tso off gso off
If Arkime was installed from the deb/rpm and the Configure script was
used, this should already be done in
/data/moloch/bin/moloch_config_interfaces.sh
The packetThreads config option controls the number of threads processing
the packets, not the number of threads reading the packets off the network card.
You only need to change the value if you are getting the Packet Q is overflowing
error.
The packetThreads option is limited to 24 threads, but usually you only need a few.
Configuring too many packetThreads is actually worse for performance, please start with a lower number and slowly increase.
You can also change the size of the packet queue by increasing the maxPacketsInQueue setting.
To increase the number of threads the reader uses please see the documentation for the reader you are using on the settings page.
In general errors about the Disk Q being exceeded are NOT a problem with Arkime, but usually an issue with either the hardware or the packet rate exceeding what the hardware can save to disk. You will usually need to either fix/upgrade the hardware or reduce the amount of traffic being saved to disk.
dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
This is the MAX disk performance. Run several times if desired and take
the average. If you don’t want to drop any packets, you shouldn’t average
more then ~80% of the MAX disk performance. If using RAID and don’t want
drop packets during a future rebuild, ~60% is a better value. Remember
that most network numbers will be in bits while the disk performance will
be in bytes, so you’ll need to adjust the values before comparing.
xfs
make sure you use mount options defaults,inode64,noatime
If using EMC for disks:
To check your disk IO run iostat -xm 5
and look at the
following:
avgqu-sz
should be near or less then 1, otherwise linux is
queueing instead of doing
capture
down.
Other things to do/check:
arcconf SETCACHE 1 LOGICALDRIVE 1 WBB
hpssacli ctrl slot=0 modify dwc=enable
MegaCli64 -LDSetProp -ForcedWB -Immediate -Lall -aAll ; MegaCli64 -LDSetProp Cached -L0 -a0 -NoLog
capture
isn’t
handling lots of interrupts (cat /proc/interrupts
).
capture
.
curl http://arkime-wise.hostname:8081/views
on the capture
host that is dropping packets.
See settings
Think of the capture
binary much like you would tcpdump
.
The capture
binary can listen to live network interface(s), or read from historic packet capture files.
Currently Arkime works best with PCAP files, not PCAPng.
${install_dir}/bin/capture -c [config_file] -r [PCAP file]
For an entire directory, use -R [PCAP directory]
See
${install_dir}/bin/capture --help
for more info.
The --monitor
to monitor non NFS directories, --skip
to skip already loaded PCAP files, and -R
to process directories are common options. Multiple -r
and -R
options can be used.
If Arkime is failing to load a PCAP file check the following things:
--debug
which might warn of not
understanding the link type or GRE tunnel type. (Please open issues for
unknown link or GRE types)
It is also possible to enable UI in Arkime to upload PCAP.
This is less efficient then just using capture
directly, since it uploads the file and then rules capture
for you.
Just uncomment the uploadCommand
in the config.ini file.
The easy way is using the interface setting in your config.ini. It supports a semicolon ';' separated list of interfaces to listen on for live traffic. If you want to set a tag or another field per interface, use the interfaceOps setting.
The hard way, you can also have multiple capture
processes,.
/opt/arkime/etc/config.ini
, and create a section for
each of the Arkime nodes. Assuming the defaults are correct in the
[default]
section, the only thing that
MUST be set is the interface item. It is also common
to have each Arkime node talk to a different OpenSearch/Elasticsearch node if
running a cluster of OpenSearch/Elasticsearch nodes.
The arkime-m01
is an EXAMPLE node name.
[arkime-m01a]
interface=eth2
[arkime-m01b]
interface=eth5
hostname
+ domainname
on the machine
doesn’t return a FQDN, you’ll also need to set a viewUrl, or easier
use the --host
option.
mv /etc/systemd/system/arkimecapture.service arkimecapture1.service
cp /etc/systemd/system/arkimecapture1.service /etc/systemd/system/arkimecapture2.service
ExecStart=/bin/sh -c '/opt/arkime/bin/capture -n arkime-m01a -c /opt/arkime/etc/config.ini
and ExecStart=/bin/sh -c '/opt/arkime/bin/capture -n arkime-m01b -c /opt/arkime/etc/config.ini
systemctl daemon-reload; systemctl start arkimecapture1; systemctl start arkimecapture2
You only need to run one viewer on the machine. Unless
it is started with the -n
option, it will still use the
hostname as the node name, so any special settings need to be set there
(although default is usually good enough).
Please file an issue on github with the stack trace.
sysctl
to change until the next reboot. Setting it to
0 will change it back to the default.
sysctl -w fs.suid_dumpable=2
capture
is running in.
capture
and get it to crash.
gdb
(you may need to install the gdb package first)
gdb /opt/arkime/bin/capture corefilename
bt
command
If it is easy to reproduce, sometimes it’s easier to just run
gdb
as root:
gdb capture
as root.
gdb
with
run ALL_THE_ARGS_USED_FOR_ARKIME-CAPTURE_GO_HERE
.
bt
command.
g_log
b g_log
Usually capture
is started as root so that it can
open the interfaces and then it immediately drops privileges to
dropUser
and dropGroup
, which are by default
nobody:daemon
. This means that all parent directories need
to be either owned or at least executable by nobody:daemon
and that the pcapDir itself must be writeable.
Listed in order from highest to lowest benefit to Arkime
bpf=
filter will stop Arkime from seeing the
traffic.
packet-drop-ips
section will stop
Arkime from adding packets to the PacketQ
Arkime capture supports many options for controlling which packets are captured, processed, and saved to disk.
bpf=
in the config file. This filter can be implemented in
the kernel, the network card, libpcap or network drivers. It is a
single filter and it controls what Arkime capture "sees" or doesn’t
"see". Any packet that is dropped because of the bpf filter is usually
not counted in ANY Arkime stats, but some implementation do expose
stats.
packet-drop-ips
config section to see if
the IPs involved are marked to be discarded. If there are only a few
IPs to drop then bpf=
should be used, otherwise this is
much more efficient then a huge bpf.
_dropByDst
or _dropBySrc
timeout,
if it matches they will be discarded.
packetThreads
or
maxPacketsInQueue
if too
many packets are being dropped here.
dontSaveBPFs
, and if one matches it will
save off the max number of packets to save for the session. This will
override the maxPackets
config setting.
minPacketsSaveBPFs
and save off a min number of packets that must be received.
PCAP deletion is actually handled by the viewer process, so make sure the viewer process is running on all capture boxes. The viewer process checks on startup and then every minute to see how much space is available, and if it is below freeSpaceG, then it will start deleting the oldest file. The viewer process will log every time a file is deleted, so you can figure out when a file is deleted if you need to. If the viewer complains about not finding the PCAP data, make sure you check the viewer.log.
Note: freeSpaceG can also be a percentage,
with freeSpaceG=5%
for the default.
The viewer process will always leave at least 10 PCAP files on the disk,
so make sure there is room for at least maxFileSizeG * 10
capture files on disk, or by default 120G.
If still having PCAP delete issues:
maxFileSizeG * 10
space available.
db.pl http://localhost:9200 sync-files
command
locked
set,
viewer won’t deleted locked files
--debug
to the start line or add debug=1
in the [default]
section of your config.ini file.
sestatus
) temporarily disable it (setenforce 0
) and see if that fixes the problem.
There are several common reasons dontSaveBPFs might not work for you.
--debug
to capture when starting and looking at the outputdontSaveBPFs=tcp port 443:10
use something like dontSaveBPFs=tcp port 443 or (vlan and tcp port 443):10
.
Basically FILTER or (vlan and FILTER)
. Information from here.
tcpdump -i INTERFACE tcp port 443
for example.If still having issues, you might just try out a Arkime Rules file. Arkime converts dontSaveBPFs into a rule for you behind the scenes, so Arkime Rules are actually more powerful.
Arkime buffers writes to disk, which is great for high bandwidth
networks, but bad for low bandwidth networks. How much data is buffered
is controlled with pcapWriteSize
, which defaults to 262144
bytes. An important thing to remember is the buffer is per thread, so set
packetThreads
to 1 on low bandwidth networks.
A portion of the PCAP that is buffered will be written after 10 seconds of no writes.
However it will still buffer the last pagesize bytes, usually 4096 bytes.
An error that looks like
ERROR - processSessionIdDisk - SESSIONID in file FILENAME couldn't read packet at FILEPOS packet # 0 of 2
usually means that either the PCAP is still being buffered and you need to wait for it to be written to disk or
that previously capture or the host crashed/restarted before the PCAP could be written to disk.
You can also end up with many zero byte PCAP files if the disk is full, see PCAP Deletion.
In small environments with low amounts of traffic this is possible. With Openvswitch you can create mirror port from a physical or virtual adapter and send the data to another virtual NIC as the listening interface. In KVM, one issue is that it isn’t possible to increase the buffer size past 256 on the adapter using the Virtio network adapter (mentioned in another part of the FAQ). Without Arkime capture will continuously crash. To solve this in KVM, use the E1000 adapter, and configure the buffer size accordingly. Set up the SPAN port on Openvswitch to send traffic to it: https://www.rivy.org/2013/03/configure-a-mirror-port-on-open-vswitch/.
MaxMind recently changed how you download their free database files. You now need to signup for an account and setup the geoipupdate program. If using a version of Moloch before 2.2, you will need to edit your config.ini file and update the geolite paths.
Instructions:
yum install geoipupdate
or apt-get install geoipupdate
geoipupdate
as root and see if it works/usr/share/GeoIP/GeoLite2-Country.mmdb
and geoLite2ASN is now /usr/share/GeoIP/GeoLite2-ASN.mmdb
Arkime logs a lot of information for debugging purposes.
Much of this information is for bug reports, but can also be used to figure out what is going on.
You may need to use --debug
to enable these msgs.
Jan 01 01:01:01 http.c:369 moloch_http_curlm_check_multi_info(): 8000/30 ASYNC 200 http://eshost:9200/_bulk 250342/5439 14ms 12345ms
Jan 01 01:01:01 | Date |
http.c:369 | File Name:Line Number |
moloch_http_curlm_check_multi_info | Function Name |
8000/30 | 8000 queued requests to server 30 connections to server |
ASYNC | Asynchronous request, SYNC for Synchronous request |
200 | HTTP status code |
http://eshost:9200/_bulk | Requested URL |
250032/5439 | 250342 bytes uploaded (CURLINFO_SIZE_UPLOAD) 5439 bytes downloaded (CURLINFO_SIZE_DOWNLOAD) |
14ms | 14ms to connect to server (CURLINFO_CONNECT_TIME) |
12345ms | 12345ms total request time (CURLINFO_TOTAL_TIME) |
Jan 01 01:01:01 packet.c:1185 moloch_packet_log(): packets: 3911000000 current sessions: 41771/45251 oldest: 0 - recv: 4028852297 drop: 123 (0.00) queue: 1 disk: 2 packet: 3 close: 4 ns: 5 frags: 0/1988 pstats: 4132185901/1/2/3/4/5/6
Jan 01 01:01:01 | Date |
packet.c:1185 | File Name:Line Number |
moloch_packet_log | Function Name |
packets: 3911000000 | 3911000000 packets are going to be processed by the packet queues. These packets have made it past corrupt checks, packet-drop-ips checks, and are ones we most likely understand. |
current session: 41771/45251 | 41771 monitored sessions of the current session type (usually tcp) 45251 monitored sessions total |
oldest: 0 | In the current session type queue, the oldest session should be idled out in 0 seconds |
recv: 4028852297 | 4028852297 packets have been received by the interface since process start, as reported by the reader's stats api |
drop: 123 | 123 packets have been dropped by the interface, as reported by the reader's stats api |
(0.00) | 0.00% packets have been dropped by the interface, as reported by the reader's stats api |
queue: 1 | 1 bulk request is waiting to be sent to the OpenSearch/Elasticsearch servers, each bulk request may hold multiple sessions |
disk: 2 | 2 disk buffers writes are outstanding, each buffer will hold multiple packets |
packet: 3 | 3 packets are waiting to be processed in all the packet queues |
close: 4 | 4 tcp sessions have been marked for closing (RST/FIN), waiting on last few packets |
ns: 5 | 5 sessions are ready to be saved but there is a plugin that is doing async work, such as WISE |
frags: 0/1988 | always 0 1988 current ip frags waiting to be matched |
pstats: 4132185901/1/2/3/4/5/6 | 4132185901 packets successfully sent to a packet queue 1 packet dropped because of packet-drop-ips config 2 packets dropped because the packet queues were overloaded 3 packets dropped because they were corrupt 4 packets dropped because how to process was unknown to us 5 packets dropped because of ipport rules 6 packets dropped because of packet deduping (2.7.1 enablePacketDedup) |
Click on the owl and read the Search Bar section. The Fields section is also useful for discovering fields you can use in a search expression.
The most common cause of this problem is that the timestamps between the Arkime machines are different. Make sure ntp is running everywhere, or that the time stamps are in sync.
Recent versions of Chrome, Firefox, and Safari should all work fairly equally. Below are the minimum versions required. We aren’t kidding.
Arkime Version | Chrome | Firefox | Opera | Safari | Edge | IE |
---|---|---|---|---|---|---|
Prior to 3.0 | 53 | 54 | 40 | 10 | 14 | Not Supported |
3.0 and beyond | 80 | 74 | 67 | 13.1 | 80 | Not Supported |
Development and testing is done mostly with Chrome on a Mac, so it gets the most attention.
This seems to be caused when proxying requests from one viewer node to
another and the machines don’t use FQDNs for their hostnames and the
short hostnames are not resolvable by DNS. You can check if your
machine uses FQDNs by running the hostname
command. There
are several options to resolve the error:
--host
option on capture
config.ini
and add a viewUrl
for each
node. This part of the config file must be the same on all machines
(we recommend you just use the same config file everywhere). Example:
[node1_eth0]
interface=eth0
viewUrl=http://node1.fqdn
[node1_eth1]
interface=eth1
viewUrl=http://node1.fqdn
[node2]
interface=eth1
viewUrl=http://node2.fqdn
Apache, and other web servers, can be used to provide authentication or other services for Arkime when setup as a reverse proxy. When a reverse proxy is used for authentication it must be inline, and authentication in Arkime will not be used, however Arkime will still do the authorization. Arkime will use a username that the reverse proxy passes to Arkime as a HTTP header for settings and authorization. See the architecture page for diagrams. While operators will use the proxy to reach the Arkime viewer, the viewer processes still need direct access to each other.
setsebool -P httpd_can_network_connect 1
is required.
ARKIME_USER
is the header that is being
set from a variable, if your auth method already sets a header use
that.
AuthType your_auth_method
Require valid-user
RequestHeader set ARKIME_USER %{your_auth_method_concept_of_username_variable_probably_REMOTE_USER}e
SSLProxyEngine On
#ProxyRequests On # You probably don't want this line
ProxyPass /arkime/ https://localhost:8005/ retry=0
ProxyPassReverse /arkime/ https://localhost:8005/
config.ini
userNameHeader
to the lower case version of the header Apache is setting. NOTE - the
userNameHeader setting is only needed on viewers that apache talks to, don't set on all of them.
webBasePath
to the ProxyPath location used above.
All other sections should NOT have a webBasePath
.
viewHost=localhost
, so externals can’t just set the userNameHeader
and access Arkime with no auth:
[arkime-proxy]
userNameHeader=arkime_user
webBasePath = /arkime/
viewPort = 8005
viewHost = localhost
arkime-proxy
viewer, so for this example you would need to add -n arkime-proxy
to your systemd file (/etc/systemd/system/molochviewer.service by default) on the ExecStart line after viewer.js so viewer uses that section
addUser.js
script.
userNameHeader
is the
lower case version of the header Apache is using.
viewer.js
with a --debug
and see if the
header is being sent.
It is possible to search multiple Arkime clusters by setting up a special Arkime MultiViewer and a special MultiES process. The MultiES process is similar to Elasticsearch tribe nodes, except it was created before tribe nodes and can deal with multiple indices having the same name. The MultiViewer talks to MultiES instead of a real OpenSearch/Elasticsearch instance. Currently one big limitation is that all Arkime clusters must use the same serverSecret.
To use MultiES, create another config.ini
file or section in a shared config file.
Both multies.js
and the special "all" viewer can use the same node name.
See Multi Viewer Settings for more information.
# viewer/multies node name (-n allnode)
[allnode]
# The host and port multies is running on, set with multiESHost:multiESPort usually just run on the same host
elasticsearch=127.0.0.1:8200
# This is a special multiple arkime cluster viewer
multiES=true
# Port the multies.js program is listening on, elasticsearch= must match
multiESPort = 8200
# Host the multies.js program is listening on, elasticsearch= must match
multiESHost = localhost
# Semicolon list of OpenSearch/Elasticsearch instances, one per arkime cluster. The first one listed will be used for settings
# You MUST have a name set
multiESNodes = http://escluster1.example.com:9200,name:escluster1,prefix:PREFIX;http://escluster2.example.com:9200,name:escluster2
# Uncomment if not using different rotateIndex settings
#queryAllIndices=false
Now you need to start up both the multies.js
program and
viewer.js
with the same config file AND -n allnode
.
All other viewer settings, including webBasePath
can still be used.
By default, the users table comes from the first cluster listed in multiESNodes
.
This can be overridden by setting usersElasticsearch
and
optionally usersPrefix
in the multi viewer config file.
Since 4.2.0 MultiES supports the caTrustFile
setting.
Priority to 4.2.0 you will need to create a file, for example CAcerts.pem, containing one or more trusted certificates in PEM format.
Then, you need start MutilES adding NODE_EXTRA_CA_CERTS environment variable specifying the path to file you just created, for example:
NODE_EXTRA_CA_CERTS=./CAcerts.pem /opt/arkime/bin/node multies.js -c /opt/arkime/etc/config.ini -n allnode
An admin can change anyone’s password on the Users tab by clicking the Settings link in the Actions column next to the user.
A password can also be changed by using the addUser
script, which will replace the entire account if the same userid is
used. All preferences and views will be cleared, so creating a
secondary admin account may be a better option if you need to change
an admin users password. After creating a secondary admin account,
change the users password and then delete the secondary admin account.
node addUser -c <configfilepath> <user id> <user friendly name> <password> [--admin]
Viewers have the ability to proxy traffic for each other. The ability relies on Arkime node names that are mapped to hostnames. Common problems are when systems don’t use FQDNs or certs don’t match.
First the SPI records are created on the capture
side.
capture
gets a nodename, either by the
-n
command line option or everything in front of the
first period of the hostname.
capture
writes a stats record every few
seconds that has the mapping from the nodename to the FDQN.
It is possible to override the FDQN with the --host
option to capture.
When PCAP is retrieved from a viewer it uses the nodename associated with the SPI record to find which capture host to connect to.
arkime-viewer
process gets a nodename, either by
the -n
command line option or everything in front of the
first period of the hostname.
arkime-viewer
nodename it can be processed locally, STOP HERE. This is the common
case with one arkime node.
stats[nodename].hostname
is the same as the
arkime-viewer
’s hostname (exact match) then it can be
processed locally, STOP HERE.
Remember this is written by capture above, either the FQDN or --host
.
This is the common case with multiple capture processes per capture node.
viewUrl
set in the [nodename]
section, use that.
[default]
section, use that.
stats[nodename].hostname:[nodename section - viewPort setting]
stats[nodename].hostname:[default section - viewPort setting]
stats[nodename].hostname:8005
First, look at viewer.log
on both the viewer machine and
the remote machine and see if there are any obvious errors. The most
common problems are:
config.ini
on all nodes can make
things a pain to debug and sometimes not even work. It is best to
use the same config with different sections for each node name
[nodename]
hostname
command AND the viewer machine can’t resolve
just the hostname. To fix this, do ONE of the following:
--host
option to capture
and restart capture
hostname "fullname"
as root and edit
/etc/sysconfig/network
)
viewUrl
in each node section of the
config.ini
. If you don’t have a node section for
each host, you’ll need to create one.
/etc/resolv.conf
and add
search foo.example.com
, where
foo.example.com
is the subdomain of the hosts.
Basically, you want it so "telnet shortname 8005" works on the
viewer machine to the remote machine.
dropUser
user or dropGroup
group can read the PCAP files.
Check the directories in the path too.
viewUrl
for each node.
-n
option
viewUrl
for that old node name that points to the new host.
Arkime uses Node.js for the viewer component, and requires many
packages to work fully. These packages must be compiled with and run
using the same version of Node.js. An error like … was compiled
against a different Node.js version using NODE_MODULE_VERSION 48. This
version of Node.js requires NODE_MODULE_VERSION 57.
means that
the version of Node.js used to install the packages and run the
packages are different.
This shouldn’t happen when using the prebuilt Arkime releases. If it
does, then double check that /opt/arkime/bin/node
is
being used to run viewer.
If you built Arkime yourself, this usually happens if you have a different version of node in your path. You will need to rebuild Arkime and either:
/opt/arkime/bin
is in your path before the OS
version of node
--install
option to easybutton which will add to
the path for you
By default viewer listens on port 8005. Changing this can be tricky, especially for a port less than 1024, like 443. You should definitely read the How do viewers find each other section.
Scenario | Solutions |
---|---|
Change all nodes to port > 1024 | Set viewPort in [default] section on ALL nodes |
Change single node port < 1024, remaining nodes (if any) unchanged |
Usually unless a program runs as root it can NOT listen to ports less than 1024.
Since viewer by default drops privileges before listening, even if you start as root, it isn't root anymore when trying to listen on the port.
Possible solutions are:
|
All nodes, port < 1024 | Just don't. :) If you must, most of the solutions above will work, but don't do the reverse proxy solution since viewer nodes need to talk to each other WITHOUT external authentication. |
For Hunts to work properly you must set cronQueries=true
on one and only one node.
If cronQueries is properly set up on a single node, and hunts still aren't working, make sure the cronQueries node is running and checking in. You can check this on the Stats -> ES Nodes tab and/or check the viewer logs.
See more information about the cronQueries setting here.
Parliament is designed to run behind a reverse proxy such as Apache.
Basically, you just need to tell Apache to send all root requests and any
/parliament
requests to the Parliament server.
ProxyPassMatch ^/$ http://localhost:8008/parliament retry=0
ProxyPass /parliament/ http://localhost:8008/parliament/ retry=0
Here is the common check list:
curl http://localhost:8081/fields
You should see a list of fields that WISE knows about.
wise.so
to the plugins=
line.wise.js
to the viewerPlugins=
line.wiseURL
has been set, or the older wiseHost
and wisePort
curl http://WISEHOST:8081/fields
capture
after adding a
--debug
option may print out useful information what is
wrong. Look to make sure that WISE is being called with the correct URL.
Verify that the plugins, wiseHost and wiseURL setting is what you actually think it is.
Want to add or edit this FAQ? Found an issue on this site? This site's code is open source. Please contribute!