General

Why should I use Arkime?

If you want a standalone open source full packet capture (FPC) system with meta data parsing and searching, then Arkime may be your answer! Arkime allows you complete control of deployment and architecture. There are other FPC systems available.

Why change our name?

This project has experienced significant growth, adoption, and change over the last eight years. We are now at a new milestone and believe it’s the right time to rename our project to Arkime! Read more about why we made this change here.

How do you pronounce our name?

(/ɑːrkɪˈmi/)? Read more about why we changed our name here.

Upgrading Arkime

Upgrading Arkime requires you install versions in order, as described in the chart below. If the version you are currently on isn’t listed please upgrade to the next higher version in the chart, you can then install the major releases in order to catch up. New installs can start from the latest version.

Name Version Min upgrade from ES Versions Special Instructions Notes
Moloch 2.2+ 1.7.0 (1.8.0 recomended) 6.8.2+ (6.8.6+ recommended), 7.1+ (7.8.0+ recommended, 7.7.0 broken) Moloch 2.0 instructions Must already be on 6.8.x or 7.1+ before upgrading to 2.2
Moloch 2.0, 2.1 1.7.0 (1.8.0 recomended) 6.7, 6.8, 7.1+ Moloch 2.0 instructions Must already be on ES 6.7 or 6.8 (ES 6.8.6 recommended) before upgrading to 2.0
Moloch 1.8 1.0.0 (1.1.x recommended) 5.x or 6.x ES 6 instructions Must have finished the 1.x reindexing, stop captures for best results
Moloch 1.1.1 0.20.2 (0.50.1 recommended) 5.x or 6.x (new only) Instructions Must be on ES 5 already
Moloch 0.20.2 0.18.1 (0.20.2 recomended) 2.4, 5.x ES 5 instructions

What OSes are supported?

We have RPMs/DEBs available on the downloads page Our deployment is on Centos 7 with the elrepo 4.x kernel installed for packet performance increases and better afpacket support. A large amount of development is done on Mac OS X 10.15 using MacPorts or Homebrew, however, it has never been tested in a production setting. :) Arkime is NOT supported on 32 bit machines.

The following OSes should work out of the box for compiling yourself:

Arkime is not working

Here is the common check list:

  1. Check that Elasticsearch is running and green using
    curl http://localhost:9200/_cat/health
    on the machine running Elasticsearch.
  2. Check that the db has been initialized with /data/moloch/db/db.pl http://ESHOST:9200 info
  3. Check that viewer is reachable by visiting http://arkime-viewer.hostname:8005 from your browser
    1. If it doesn’t render, looks strange or warns of an old browser, use a newer supported browser
  4. Check for errors in /data/moloch/logs/viewer.log and that viewer is running with pgrep -lf viewer
  5. Check for errors in /data/moloch/logs/capture.log and that capture is running with pgrep -lf capture
  6. Check that the stats page shows the capture nodes you are expecting, visit http://arkime-viewer.hostname:8005/stats?statsTab=1 in your browser.
    1. Make sure the nodes are showing packets being received
    2. Make sure the timestamp for nodes is recent (within 5 seconds)
  7. Disable any bpf= in /data/moloch/etc/config.ini, if that fixes the issue read BPF FAQ answer
  8. If the browser has "Oh no, Arkime is empty! There is no data to search." but the stats tab shows packets are being captured:
    1. Live capture Arkime only writes records when a session has ended, it will take several minutes for session to show up after a fresh start, see /data/moloch/etc/config.ini to shorten the timeouts
    2. Elasticsearch will only refresh the indices once a minute with the default Arkime config, force a refresh with curl http://ESHOST:9200/_refresh
    3. Verify your time frame for search covers the data (try switching to ALL)
    4. Check that you don’t have a view set
    5. Check that your user doesn’t have a forced expression set, might need to ask your Arkime admin
  9. Restart moloch-capture after adding a --debug option may print out useful information what is wrong if you are having packet capture issues. You can add multiple --debug options to get even more information. Capture will print out the config settings it is using, verify they are what you expect.
  10. Restart viewer after adding a --debug option may print out useful information what is wrong if you are having issues viewing packets that were captured
    1. Make sure the plugins and parsers directories are correctly set in /data/moloch/etc/config.ini and readable by the viewer process

How do I reset Arkime?

  1. Leave Elasticsearch running
  2. Shutdown all running viewer or capture processes so no new data is recorded.
  3. To delete all the SPI data stored in Elasticsearch, use the db.pl script with either the init or wipe commands. The only difference between the two commands is that wipe leaves the added users so they don’t need to be re-added.
    /data/moloch/db/db.pl ESHOST:ESPORT wipe
  4. Delete the PCAP files. The PCAP files are stored on the file system in raw format. You need to do this on all of the capture machines.
    /bin/rm -f /data/moloch/raw/*

Self-Signed SSL/TLS Certificates

It is possible to get self signed certificates to work in the following scenarios:

Usually the easiest way is to add the self signed cert to the OS's list of valid certificates or chains. Googling is the best way to figure out how to do this. Viewer also supports a caTrustFile option.

The core Arkime team does not support or recommend self signed certs. Use the money you are saving on a commercial product and go buy real certs. Wildcard certs are now cheap and you can even go with free Lets Encrypt certs. There may be folks on the Arkime slack workspace willing to help out.

Both capture and viewer can run with --insecure to turn off cert checking. You will need to add this option to the startup command for both capture, viewer. For example /etc/systemd/system/arkimecapture.conf.

How do I upgrade to Moloch 1.0

Moloch 1.0 has some large changes and updates that will require all session data to be reindexed. The reindexing is done in the background AFTER upgrading so there is little downtime. Large changes in 1.0 include:

If you have any special parsers, tagger, plugins or wise source you may have to change configurations.

To upgrade:

Once 1.1.1 is working, its time to reindex the old session data:

How do I upgrade to Moloch 2.0

Upgrading to Moloch 2.0 is a multistep process that requires an outage. An outage is required because all the captures must be stopped before upgrading the database so there are no schema issues or corruption. Most of the administrative indices will have new version numbers after this upgrade, so that elasticsearch knows they were created with 6.7 or 6.8. This is very important when upgrading to ES 7.x later.

After verifying 2.0 is working fine while still using Elasticsearch 6, you can upgrade to Elasticsearch 7 using these instructions.


Elasticsearch

How many Elasticsearch nodes or machines do I need?

The answer, of course, is "it depends". Factors include:

Some important things to remember when designing your cluster:

We have some estimators that may help.

The good news is that it is easy to add new nodes in the future, so feel free to start with less nodes. As a temporary fix to capacity problems, you can reduce the number of days of meta data that are stored. You can use the Arkime ES Indices tab to delete the oldest sessions2 index.

Data never gets deleted

The SPI data in Elasticsearch and the PCAP data are not deleted at the same time. The PCAP data is deleted as the disk fills up on the capture machines, more info. PCAP deletion happens automatically and nothing needs to be done. The SPI data is deleted when the ./db.pl expire command is run, usually from cron during off peak. There is a sample in the daily.sh script. The SPI data deletion does NOT happen automatically and a cron job MUST be set up.

So deleting a PCAP file will NOT delete the SPI data, and deleting the SPI data will not delete the PCAP data from disk.

The UI does have commands to delete and scrub individual sessions, but the user must have the Remove Data ability on the users tab. Usually this feature is used for things you don’t want operators to see, such as bad images.

ERROR - Dropping request /_bulk

This error almost always means that your Elasticsearch cluster can not keep up with the amount of sessions that the capture nodes are trying to send it. You may only see the error message on your busiest capture nodes since capture tries to buffer the requests.

Some things to check:

If these don’t help, you need to add more nodes or reduce the number of sessions being monitored. You can reduce the number of sessions with packet-drop-ips or bpf filters or rules files for example.

When do I add additional nodes? Why are queries slow?

If queries are too slow the easiest fix is to add additional Elasticsearch nodes. Elasticsearch doesn’t do well if Java hits an OutOfMemory condition. If you ever have one, you should immediately delete the oldest sessions2-* index, update the daily.sh script to delete more often, and restart the Elasticsearch cluster. Then you should order more machines. :)

Removing nodes

How to enable Elasticsearch replication

Turning on replication will consume twice the disk space on the nodes and increase the network bandwidth between nodes, so make sure you actually need replication.

To change future days

db/db.pl <ESHOST:ESPORT> upgrade --replicas 1

To change past days, but not the current day

db/db.pl <ESHOST:ESPORT> expire <type> <num> --replicas 1

We recommend the second solution since it allows current traffic to be written to ES once, and during off peak the previous days traffic will be replicated.

How do I upgrade Elasticsearch?

In general if upgrading between minor or build versions of Elasticsearch you can do a rolling upgrade with no issues. Follow Elastic's instructions for best results. Make sure you select the matching version of that document for your version of ES from the pull down on the right hand side.

Upgrading between Major versions of Elasticsearch usually require an upgrade of Arkime. So please see the instructions in this FAQ for

How do I upgrade to ES 7.x

  1. If you are NOT using Arkime DB version 63 (or later) you must follow these instructions while still using ES 6.x and upgrade to Arkime 2.0. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Arkime.
  2. Make sure your Elasticsearch config files are ready for ES 7. We do not provide sample Elasticsearch configs, but here are some things to look out for:
  3. Now you need to upgrade from ES 6 to ES 7. There are two options:
    1. Upgrading to ES 7 if using ES 6.8.6 (or later) can be done with a rolling upgrade. Follow Elastic's instructions for best results. You do NOT need to stop capture/viewer, but after the rolling upgrade is finish you may want to restart capture everywhere.
    2. If not using ES 6.8.6, or if you would prefer to do a full restart follow the instructions below
      • Make sure you delete any old indices that db.pl complained about when you installed Moloch 2.0
      • Shutdown everything: elasticsearch, capture, and viewer
      • Upgrade ES to 7.x (7.8.0 or later recommended)
      • Start ES cluster
      • Wait for cluster to go GREEN, this will take LONGER then usual as ES upgrades things from 6.x to 7.x format.
        curl http://localhost:9200/_cat/health
      • Start viewers and captures

How do I upgrade to ES 6.x

ES 6.x is supported by Moloch 1.x for NEW clusters and >= 1.5 for UPGRADING clusters.

NOTE - If upgrading, you must FIRST upgrade to Moloch 1.0 or 1.1 (1.1.1 recommended) before upgrading to > 1.5. Also all reindex operations needs to be finished.

We do NOT provide ES 6 startup scripts or configuration, so if upgrading please make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to ES 6 will REQUIRE two downtimes

First outage: If you are NOT using Moloch DB version 51 (or later) you must follow these steps while still using ES 5.x. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Moloch.

Second outage: Upgrade to ES 6

How do I upgrade to ES 5.x

ES 5.x is supported by Moloch 0.17.1 for NEW clusters and 0.18.1 for UPGRADING clusters.

ES 5.0.x, 5.1.x and 5.3.0 are NOT supported because of ES bugs/issues. We currently use 5.6.7.

WARNING - If you have sessions-* indices created with ES 1.x, you can NOT upgrade. Those indices will need to be deleted.

We do NOT provide ES 5 startup scripts, so if upgrading please make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to ES 5 may REQUIRE 2 downtime periods of about 5-15 minutes each.

First outage: If you are NOT using Moloch DB version 34 (or later) you must follow these steps while still using ES 2.4. To find what DB version you are using, either run db.pl http://ESHOST:9200 info or mouse over the in Moloch.

Second outage: Upgrade to ES 5

version conflict, current version [N] is higher or equal to the one provided [M]

This error usually happens when the capture process is trying to update the stats data and falls behind. If it happens regularly it may indicate the elasticsearch process is having a hard time keeping up with indexing.

Here are some of our recommended Elasticsearch settings. Many of these can be updated on the fly, but it is still best to put them in your elasticsearch.yml file. We strongly recommend using the same elasticsearch.yml file on all hosts, things that need to be different per host can be set with variables.

Disk Watermark

You will probably want to change the watermark settings so you can use more of your disk space. You have the option to use ALL percentages or ALL values, but you can’t mix them. The most common sign of a problem with these settings is an error that has FORBIDDEN/12/index read-only / allow delete in it. You can use ./db.pl http://ESHOST:9200 unflood-stage _all to clear the error, once you adjust the settings and/or delete some data. Elasticsearch Docs

cluster.routing.allocation.disk.watermark.low: 97%
cluster.routing.allocation.disk.watermark.high: 98%
cluster.routing.allocation.disk.watermark.flood_stage: 99%

or if you want more control use values instead of percentages:

cluster.routing.allocation.disk.watermark.low: 300gb
cluster.routing.allocation.disk.watermark.high: 200gb
cluster.routing.allocation.disk.watermark.flood_stage: 100gb

Shard Limit

If you have a lot of shards that you want to be able to search against at once Elasticsearch Docs

action.search.shard_count.limit: 100000

Write Queue Limit

If you hit a lot of bulk failures this can help, but Elastic doesn’t recommend raising too much. In older versions of Elasticsearch it is named thread_pool.bulk.queue_size so check the docs for your version. Elasticsearch Docs

thread_pool.write.queue_size: 2000

HTTP Compression

On by default in most versions, allows for HTTP compression. Elasticsearch Docs

http.compression: true

Recovery Time

To speed up recovery times and startup times there are a few controls to play with. Make sure you test them in your environment, and slowly increase them, because they can break things badly. Elasticsearch Allocation Docs and Elasticsearch Recovery Docs

cluster.routing.allocation.cluster_concurrent_rebalance: 10
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.node_initial_primaries_recoveries: 5
indices.recovery.max_bytes_per_sec: "400mb"

Logging

By default elasticsearch has logging set to debug level in log4j2.properties. For busy clusters change this to info level to lower CPU and disk usage.


logger.action.level = info

Using ILM with Arkime

Since Moloch 2.2 you can easily use ILM to handle moving indices from hot to warm, force merge, and delete actions. We recommend only using ILM with newer versions (7.2+) of Elasticsearch, since older versions did have some issues. Once ILM is enabled you no longer have to use the db.pl expire cron job, but should occasionally run db.pl optimize-admin.

ILM is only included in the free "basic" Elasticsearch license, so it is not part of the Elasticsearch OSS distribution, so you may need to upgrade. Arkime does NOT currently support the auto rollover feature of ILM for performance reasons when searching.

There are four important steps to have ILM work correctly with Arkime.

  1. Create the arkimesessions and arkimehistory ILM polices. This can be done easily with kibana or we recommend db.pl ilm command.
  2. Assign the arkimesessions and arkimehistory ILM polices to all the existing indices. Kibana or db.pl ilm can perform this action.
  3. Change the Arkime templates to use the ILM polices for NEW indices. You'll need to rerun db.pl upgrade ... --ilm and add the --ilm to the command
  4. Replace the previous db.pl expire cron job with db.pl optimize-admin

So for example to create new policies that keeps 30 weeks of history, 90 days of SPI data, with 1 replica, and optimizes all indices older than 25 hours you would run: ./db.pl http://localhost:9200 ilm 25h 90d --history 30 --replicas 1 You would then need to run upgrade with all the arguments you usually used plus --ilm: ./db.pl http://localhost:9200 upgrade --replicas 1 --shards 5 --ilm


Capture

What kind of capture machines should we buy?

The goal of Arkime is to use commodity hardware. If you start thinking about using SSDs or expensive NICs, research if it would just be cheaper to buy one more box. This gains more retention and can bring the cost of each machine down.

Some things to remember when selecting a machine:

When selecting Arkime capture boxes, standard "Big Data" boxes might be the best bet. ($10k-$25k each) Look for:

We are big fans of using Network Packet Brokers ($6k+). They allow multiple taps/mirrors to be aggregated and load balanced across multiple moloch-capture machines. Read more below.

What kind of Network Packet Broker should we buy?

We are big fans of using Network Packet Brokers. If there is one piece of advice we can give medium or large Arkime deployments — use a NPB. See MolochON 2017 NPB Preso

Main Advantages:

Features to look for

Just like with Arkime with commodity hardware, you don’t necessarily have to pay a lot of money for a good NPB. Some switch vendors have switches that can operate in switch mode or npb mode, so you might already have gear laying around you can use.

Sample vendors

What kind of packet capture speeds can arkime-capture handle?

On basic commodity hardware, it is easy to get 3Gbps or more, depending on the number of CPUs available to Arkime and what else the machine is doing. Many times the limiting factor can be the speed of the disks and RAID system. See Architecture and Multiple Host for more information. Arkime allows multiple threads to be used to process the packets.

To test the local RAID device use:

dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct
To test a NAS leave off the oflag=direct and make sure you test with at least 3x the amount of memory so cache isn't a factor use:
dd bs=256k count=150000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test

This is the MAX disk performance. Run several times if desired and take the average. If you don’t want to drop any packets, you shouldn’t average more then ~80% of the MAX disk performance. If using RAID and don’t want drop packets during a future rebuild, ~60% is a better value. Remember that most network numbers will be in bits while the disk performance will be in bytes, so you’ll need to adjust the values before comparing.

Arkime requires full packet captures error

When you get an error about the capture length not matching the packet length, it is NOT an issue with Arkime. The issue is with the network card settings.

By default modern network cards offload work that the CPUs would need to do. They will defragment packets or reassemble tcp sessions and pass the results to the host. However this is NOT what we want for packet captures, we want what is actually on the network. So you will need to configure the network card to turn off all the features that hide the real packets from Arkime.

The sample config files (/data/moloch/bin/arkime_config_interfaces.sh) turn off many common features but there are still some possible problems:

  1. If using a VM for Arkime, you need to turn off the features on the physical interface the vm interface is mapped to
  2. If using a fancy card there may be other features that need to be turned off.
    1. You can find them usually with ethtool -k INTERFACE | grep on — Anything that is still on, turn off and see if that fixes the problem
    2. For example ethtool -K INTERFACE tx off sg off gro off gso off lro off tso off

There are two work arounds:

  1. If you are reading from a file you can set readTruncatedPackets=true in the config file, this is the only solution for saved .pcap files
  2. You can increase the max packet length with snapLen=65536 in the config file, this is not recommended

Why am I dropping packets?

There are several different types of packet drops and reasons for packet drops:

Arkime Version

Please make sure you are using a recent version of Arkime. Constant improvements are made and it is hard for us to support older versions.

Kernel and TPACKET_V3 support

The most common cause of packet drops with Arkime is leaving the reader default of libpcap instead of switching to tpacketv3, pfring or one of the other high performance packet readers. We strongly recommend tpacketv3, but it does required a newer kernel of 3.2 or later. See plugin settings for more information.

For those stuck on Centos 6 use elrepo and install kernel-ml on the machines that will RUN moloch-capture. Install kernel-ml-headers on the machines that will COMPILE Arkime. Download the packages from http://elrepo.org/linux/kernel/el6/x86_64/RPMS/. The rebuilt Arkime RPMs already have been compiled on a machine with newer kernel.

Network Card Config

Make sure the network card is configured correctly by increasing the ring buf to max size and turning off most of the card’s features. The features are not useful anyway, since we want to capture what is on the network instead of what the local OS sees. Example configuration:

# Set ring buf size, see max with ethool -g eth0
ethtool -G eth0 rx 4096 tx 4096
# Turn off feature, see available features with ethtool -k eth0
ethtool -K eth0 rx off tx off gs off tso off gso off

If Arkime was installed from the deb/rpm and the Configure script was used, this should already be done in /data/moloch/bin/arkime_config_interfaces.sh

packetThreads and the PacketQ is overflowing error

The packetThreads config option controls the number of threads processing the packets, not the number of threads reading the packets off the network card. You only need to change the value if you are getting the Packet Q is overflowing error. The packetThreads option is limited to 24 threads, but usually you only need a few. Configuring too many packetThreads is actually worse for performance, please start with a lower number and slowly increase. You can also change the size of the packet queue by increasing the maxPacketsInQueue setting.

To increase the number of threads the reader uses please see the documentation for the reader you are using on the settings page.

Disk

Make sure swap has been disabled, swappiness is 0, or at the very least, isn’t writing to the disk being used for PCAP.

Make sure the RAID isn’t in the middle of a rebuild or something worse. Most RAID cards will have a status of OPTIMAL when things are all good and DEGRADED or SUBOPTIMAL when things are bad.

To test the RAID device use:

dd bs=256k count=50000 if=/dev/zero of=/THE_ARKIME_PCAP_DIR/test oflag=direct

If you are using xfs make sure you use mount options defaults,inode64,noatime

  • Don’t run capture and Elasticsearch on the same machine.
  • Make sure you actually have enough disk write thru capacity and disks. For example, for a 1G link with RAID 5 you may need:
    • At least 4 spindles if using a RAID 5 card with write cache enabled.
    • At least 8 spindles (or more) if using a RAID 5 card with write cache disabled.
  • Make sure your RAID card can actually handle the write rate. Many onboard RAID 5 controllers can not handle sustained 1G write rates.
  • Switch to RAID 0 from RAID 5 if you can live with the TOTAL data loss on a single disk failure.

If using EMC for disks:

  • Make sure write cache is enabled for the LUNs.
  • If it is a CX with SATA drives, RAID-3 is optimized for large sequential I/O.
  • Monitor EMC lun queue depth, may be too many hosts sharing it.

To check your disk IO run iostat -xm 5 and look at the following:

  • wMB/s will give you the current write rate, does it match up with what you expect?
  • avgqu-sz should be near or less then 1, otherwise linux is queueing instead of doing
  • await should be near or less then 10, otherwise the IO system is slow, which will slow moloch-capture down.

Other things to do/check:

  • If using RAID 5 make sure you have write cache enabled on the RAID card.
    • Adaptec Example: arcconf SETCACHE 1 LOGICALDRIVE 1 WBB<
    • HP Example: hpssacli ctrl slot=0 modify dwc=enable
  • Maybe taskset to give moloch-capture its own CPU, although with the new pcapWriteMethod thread or thread-direct setting, this isn’t needed and may hurt.

Other

  • There are conflicting reports that disabling irqbalancer may help.
  • Check that the CPU you are giving moloch-capture isn’t handling lots of interrupts (cat /proc/interrupts).
  • Make sure other processes aren’t using the same CPU as moloch-capture.

WISE

  • Cyclical packet drops may be caused by bad connectivity to the wise server. Verify that the wiseService responds quickly
    curl http://arkime-wise.hostname:8081/views
    on the moloch-capture host that is dropping packets.

High Performance Settings

See settings

How do I import existing PCAPs?

Think of the moloch-capture binary much like you would tcpdump. moloch-capture can listen to live network interface(s), or read from historic packet capture files. Currently Arkime works best with PCAP files, not pcapng.

${arkime_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]

For an entire directory, use -R [pcap directory]

See ${arkime_dir}/bin/moloch-capture --help for more info

If Arkime is failing to load a PCAP file check the following things:

How do I monitor multiple interfaces?

The easy way is using the interface setting in your config.ini. It supports a semicolon ';' separated list of interfaces to listen on for live traffic.

The hard way, you can also have multiple moloch-capture processes,.

You only need to run one viewer on the machine. Unless it is started with the -n option, it will still use the hostname as the node name, so any special settings need to be set there (although default is usually good enough).

Arkime capture crashes

Please file an issue on github with the stack trace.

If it is easy to reproduce, sometimes it’s easier to just run gdb as root:

ERROR - pcap open failed

Usually moloch-capture is started as root so that it can open the interfaces and then it immediately drops privileges to dropUser and dropGroup, which are by default nobody:daemon. This means that all parent directories need to be either owned or at least executable by nobody:daemon and that the pcapDir itself must be writeable.

How to reduce amount of traffic/pcap?

Listed in order from highest to lowest benefit to Arkime

  1. Setting the bfp= filter will stop Arkime from seeing the traffic.
  2. Adding CIDRs to the packet-drop-ips section will stop Arkime from adding packets to the PacketQ
  3. Using Rules it is possible to control if the packets are written to disk or the SPI data is sent to Elasticsearch

Life of a packet

Arkime capture supports many options for controlling which packets are captured, processed, and saved to disk.

PCAP Deletion

PCAP deletion is actually handled by the viewer process, so make sure the viewer process is running on all capture boxes. The viewer process checks on startup and then every minute to see how much space is available, and if it is below freeSpaceG, then it will start deleting the oldest file.

Note: freeSpaceG can also be a percentage, newer versions of Arkime use freeSpaceG=5% for the default. The viewer process will always leave at least 10 PCAP files on the disk, so make sure there is room for at least maxFileSizeG * 10 capture files on disk, or by default 120G.

If still having pcap delete issues:

  1. Make sure freeSpaceG is set correctly for the environment.
  2. Make sure there is free space where viewer is writing its logs.
  3. Make sure viewer can reach Elasticsearch
  4. Make sure that dropUser or dropGroup can actually delete files in the pcap directory and has read/execute permissions in all parent directories.
  5. Make sure the pcap directory is on a filesystem with at least maxFileSizeG * 10 space available.
  6. Make sure the files you think should be deleted show up on the files tab, if not use the db.pl sync-files command.
  7. Make sure the files in the file tab don’t have locked set, viewer won’t deleted locked files
  8. Try restarting viewer
  9. The viewer.log should be printing out that it is trying to delete files

dontSaveBPFs doesn’t work

There are several common reasons dontSaveBPFs might not work for you.

  1. Make sure you've spelled it dontSaveBPFs, case matters
  2. Make sure you've placed dontSaveBPFs in the correct section, you can verify by adding a --debug to capture when starting and looking at the output
  3. Turns out BPF filters are tricky. :) When the network is using vlans, then at compile time, BPFs need to know that fact. So instead of a nice simple dontSaveBPFs=tcp port 443:10 use something like dontSaveBPFs=tcp port 443 or (vlan and tcp port 443):10. Basically FILTER or (vlan and FILTER). Information from here.
  4. Try testing your filter manually with tcpdump, you should only see the traffic you want to drop. So something like tcpdump -i INTERFACE tcp port 443 for example.

If still having issues, you might just try out a Arkime Rules file. Arkime converts dontSaveBPFs into a rule for you behind the scenes.

Zero or missing bytes PCAP files

Arkime buffers writes to disk, which is great for high bandwidth networks, but bad for low bandwidth networks. How much data is buffered is controlled with pcapWriteSize, which defaults to 262144 bytes. An important thing to remember is the buffer is per thread, so set packetThreads to 1 on low bandwidth networks. A portion of the pcap that is buffered will be written after 10 seconds of no writes. However it will still buffer the last pagesize bytes, usually 4096 bytes.

An error that looks like ERROR - processSessionIdDisk - SESSIONID in file FILENAME couldn't read packet at FILEPOS packet # 0 of 2 usually means that either the pcap is still being buffered and you need to wait for it to be written to disk or that previously capture or the host crashed/restarted before the pcap could be written to disk.

You can also end up with many zero byte pcap files if the disk is full, see PCAP Deletion.

Can I virtualize Arkime with KVM using OpenVswitch?

In small environments with low amounts of traffic this is possible. With Openvswitch you can create mirror port from a physical or virtual adapter and send the data to another virtual NIC as the listening interface. In KVM, one issue is that it isn’t possible to increase the buffer size past 256 on the adapter using the Virtio network adapter (mentioned in another part of the FAQ). Without Arkime capture will continuously crash. To solve this in KVM, use the E1000 adapter, and configure the buffer size accordingly. Set up the SPAN port on Openvswitch to send traffic to it: https://www.rivy.org/2013/03/configure-a-mirror-port-on-open-vswitch/.

Installing MaxMind Geo free database files

MaxMind recently changed how you download their free database files. You now need to signup for an account and setup the geoipupdate program. If using a version of Arkime before 2.2, you will need to edit your config.ini file and update the geolite paths.

Instructions:

  1. Sign up for a MaxMind account (no purchase required)
  2. Wait for MaxMind email and set your password
  3. Install the geoipupdate tool, pay attention to version installed, for many distributions you can just do a yum install geoipupdate or apt-get install geoipupdate
  4. Create a license key
  5. Select Yes when asked "Will this key be used for GeoIP Update?" and select the version you have
  6. Use the MaxMind feature to generate a config file for you, usually you will replace /etc/GeoIP.conf with this file
  7. Run geoipupdate as root and see if it works
  8. If you are using Moloch before 2.2, update your /data/moloch/etc/config.ini file so that geoLite2Country is now /usr/share/GeoIP/GeoLite2-Country.mmdb and geoLite2ASN is now /usr/share/GeoIP/GeoLite2-ASN.mmdb
  9. Restart moloch-capture

What do these log lines mean?

Arkime logs a lot of information for debugging purposes. Much of this information is for bug reports, but can also be used to figure out what is going on. You may need to use --debug to enable these msgs.

HTTP Responses

Jan 01 01:01:01 http.c:369 arkime_http_curlm_check_multi_info(): 8000/30 ASYNC 200 http://eshost:9200/_bulk 250342/5439 14ms 12345ms
Jan 01 01:01:01Date
http.c:369File Name:Line Number
arkime_http_curlm_check_multi_infoFunction Name
8000/308000 queued requests to server
30 connections to server
ASYNCAsynchronous request, SYNC for Syncrhonous request
200HTTP status code
http://eshost:9200/_bulkRequested URL
250032/5439250342 bytes uploaded (CURLINFO_SIZE_UPLOAD)
5439 bytes downloaded (CURLINFO_SIZE_DOWNLOAD)
14ms14ms to connect to server (CURLINFO_CONNECT_TIME)
12345ms12345ms total request time (CURLINFO_TOTAL_TIME)

Periodic Packet Progress

Jan 01 01:01:01 packet.c:1185 arkime_packet_log(): packets: 3911000000 current sessions: 41771/45251 oldest: 0 - recv: 4028852297 drop: 123 (0.00) queue: 1 disk: 2 packet: 3 close: 4 ns: 5 frags: 0/1988 pstats: 4132185901/1/2/3/4/5
Jan 01 01:01:01Date
packet.c:1185File Name:Line Number
arkime_packet_logFunction Name
packets: 39110000003911000000 packets are going to be processed by the packet queues. These packets have made it past corrupt checks, packet-drop-ips checks, and are ones we most likely understand.
current session: 41771/4525141771 monitored sessions of the current session type (usually tcp)
45251 monitored sessions total
oldest: 0In the current session type queue, the oldest session should be idled out in 0 seconds
recv: 40288522974028852297 packets have been received by the interface since process start, as reported by the reader's stats api
drop: 123123 packets have been dropped by the interface, as reported by the reader's stats api
(0.00)0.00% packets have been dropped by the interface, as reported by the reader's stats api
queue: 11 bulk request is waiting to be sent to the ES servers, each bulk request may hold multiple sessions
disk: 22 disk buffers writes are outstanding, each buffer will hold multiple packets
packet: 33 packets are waiting to be processed in all the packet queues
close: 44 tcp sessions have been marked for closing (RST/FIN), waiting on last few packets
ns: 55 sessions are ready to be saved but there is a plugin that is doing async work, such as wise
frags: 0/1988always 0
1988 current ip frags waiting to be matched
pstats: 4132185901/1/2/3/4/5 4132185901 packets successfully sent to a packet queue
1 packet dropped because of packet-drop-ips config
2 packets dropped because the packet queues were overloaded
3 packets dropped because they were corrupt
4 packets dropped because how to process was unknown to us
5 packets dropped because of ipport rules


Viewer

Where do I learn more about the expressions available

Click on the owl and read the Search Bar section. The Fields section is also useful for discovering fields you can use in a search expression.

Exported pcap files are corrupt, sometimes session detail fails

The most common cause of this problem is that the timestamps between the Arkime machines are different. Make sure ntp is running everywhere, or that the time stamps are in sync.

Map counts are wrong

What browsers are supported?

Recent versions of Chrome, Firefox, and Safari should all work fairly equally. Below are the minimum versions required. We aren’t kidding.

Development and testing is done mostly with Chrome on a Mac, so it gets the most attention.

Error: getaddrinfo EADDRINFO

This seems to be caused when proxying requests from one viewer node to another and the machines don’t use FQDNs for their hostnames and the short hostnames are not resolvable by DNS. You can check if your machine uses FQDNs by running the hostname command. There are several options to resolve the error:

  1. Use the --host option on capture
  2. Configure the OS to use FQDNs.
  3. Make it so DNS can resolve the shortnames or add the shortnames to the hosts file.
  4. Edit config.ini and add a viewUrl for each node. This part of the config file must be the same on all machines (we recommend you just use the same config file everywhere). Example:
    [node1_eth0]
    interface=eth0
    viewUrl=http://node1.fqdn
    [node1_eth1]
    interface=eth1
    viewUrl=http://node1.fqdn
    [node2]
    interface=eth1
    viewUrl=http://node2.fqdn

How do I proxy Arkime using Apache

Apache, and other web servers, can be used to provide authentication or other services for Arkime when setup as a reverse proxy. When a reverse proxy is used for authentication it must be inline, and authentication in Arkime will not be used, however Arkime will still do the authorization. Arkime will use a username that the reverse proxy passes to Arkime as a HTTP header for settings and authorization. See the architecture page for diagrams. While operators will use the proxy to reach the Arkime viewer, the viewer processes still need direct access to each other.

I still get prompted for password after setting up Apache auth

  1. Make sure the user has the "Web Auth Header" checked
  2. Make sure in the viewer config userNameHeaderis the lower case version of the header Apache is using.
  3. Run viewer.js with a --debug and see if the header is being sent.

How do I search multiple Arkime clusters

It is possible to search multiple Arkime clusters by setting up a special multiple Arkime viewer and a special MultiES process. MultiES is similiar to Elasticsearch tribe nodes, except it was created before tribe nodes and can deal with multiple indexes having the same name. Prior to Moloch 1.5, one big limitation currently is that all the Moloch clusters have to use the same rotateIndex setting. Since Moloch 1.5 if using different rotateIndex settings, use the queryAllIndices=true setting in the molochviewer section. Currently one big limitation is that all Arkime clusters must use the same PasswordSecret.

To use MultiES, create another config.ini file or section in a shared config file. Both multies.js and the special "all" viewer can use the same node name.

# viewer/multies node name (-n allnode)
[allnode]
# The host and port multies is running on, set with multiESHost:multiESPort usually just run on the same host
elasticsearch=127.0.0.1:8200
# This is a special multiple arkime cluster viewer
multiES=true
# Port the multies.js program is listening on, elasticsearch= must match
multiESPort = 8200
# Host the multies.js program is listening on, elasticsearch= must match
multiESHost = localhost
# Semicolon list of elasticsearch instances, one per arkime cluster.  The first one listed will be used for settings
multiESNodes = es-cluster1.example.com:9200;es-cluster2.example.com:9200
# Uncomment if using different rotateIndex settings
#queryAllIndices=true

Now you need to start up both the multies.js program and viewer.js with the same config file AND -n allnode. All other viewer settings, including webBasePath can still be used.

By default, the users table comes from the first cluster listed in multiESNodes. This can be overridden by setting usersElasticsearch and optionally usersPrefix in the multi viewer config file.

How do I use self-signed SSL/TLS Certificates with MultiES?

Create a file, for example CAcerts.pem, containing one or more trusted certificates in PEM format.

Then, you need start MutilES adding NODE_EXTRA_CA_CERTS environment variable specifying the path to file you just created, for example:

NODE_EXTRA_CA_CERTS=./CAcerts.pem /data/moloch/bin/node multies.js -c /data/moloch/etc/config.ini -n allnode

How do I reset my password?

An admin can change anyone’s password on the Users tab by clicking the Settings link in the Actions column next to the user.

A password can also be changed by using the addUser script, which will replace the entire account if the same userid is used. All preferences and views will be cleared, so creating a secondary admin account may be a better option if you need to change an admin users password. After creating a secondary admin account, change the users password and then delete the secondary admin account.

node addUser -c <configfilepath> <user id> <user friendly name> <password> [--admin]

Error: Couldn’t connect to remote viewer, only displaying SPI data

Viewers have the ability to proxy traffic for each other. The ability relies on Arkime node names that are mapped to hostnames. Common problems are when systems don’t use FQDNs or certs don’t match.

How do viewers find each other

First the SPI records are created on the moloch-capture side.

  1. Each moloch-capture gets a nodename, either by the -n command line option or everything in front of the first period of the hostname.
  2. Each moloch-capture writes a stats record every few seconds that has the mapping from the nodename to the FDQN. It is possible to override the FDQN with the --host option to capture.
  3. <
  4. Each SPI record has a nodename in it.

When pcap is retrieved from a viewer it uses the nodename associated with the SPI record to find which capture host to connect to.

  1. Each arkime-viewer process gets a nodename, either by the -n command line option or everything in front of the first period of the hostname.
  2. If the SPI record nodename is the same as the arkime-viewer nodename it can be processed locally, STOP HERE. This is the common case with one arkime node.
  3. If the stats[nodename].hostname is the same as the arkime-viewer’s hostname (exact match) then it can be processed locally, STOP HERE. Remember this is written by capture above, either the FQDN or --host. This is the common case with multiple capture processes per capture node.
  4. If we make it here, the pcap data isn’t local and it must be proxied.
  5. If there is a viewUrl set in the [nodename] section, use that.
  6. If there is a viewUrl set in the [default] section, use that.
  7. Use stats[nodename].hostname:[nodename section - viewPort setting]
  8. Use stats[nodename].hostname:[default section - viewPort setting]
  9. Use stats[nodename].hostname:8005

Possible fixes

First, look at viewer.log on both the viewer machine and the remote machine and see if there are any obvious errors. The most common problems are:

  1. Not using the same config.ini on all nodes can make things a pain to debug and sometimes not even work. It is best to use the same config with different sections for each node name [nodename]
  2. The remote machine doesn’t return a FQDN from the hostname command AND the viewer machine can’t resolve just the hostname. To fix this, do ONE of the following:
    1. Use the --host option to moloch-capture and restart capture
    2. Make it so the remote machines returns a FQDN (hostname "fullname" as root and edit /etc/sysconfig/network)
    3. Set a viewUrl in each node section of the config.ini. If you don’t have a node section for each host, you’ll need to create one.
    4. Edit /etc/resolv.conf and add search foo.example.com, where foo.example.com is the subdomain of the hosts. Basically, you want it so "telnet shortname 8005" works on the viewer machine to the remote machine.
  3. The remote machine’s FQDN doesn’t match the CN or SANs in the cert it is presenting. The fixes are the same as #2 above.
  4. The remote machine is using a self signed cert. To fix this, either turn off HTTPS or see the certificate answer above.
  5. The remote machine can’t open the PCAP. Make sure the dropUser user can read the pcap files. Check the directories in the path too.
  6. Make sure all viewers are either using HTTPS or not using HTTPS, if only some are using HTTPS then you need to set viewUrl for each node.
    1. When troubleshooting this issue, it is sometimes easier to disable HTTPS everywhere
  7. If you want to change the hostname of a capture node:
    1. Change your mind :)
    2. Reuse the same node name as previously with a -n option
    3. Use the viewUrl for that old nodename that points to the new host.

Compiled against a different Node.js version error

Arkime uses Node.js for the viewer component, and requires many packages to work fully. These packages must be compiled with and run using the same version of Node.js. An error like …​ was compiled against a different Node.js version using NODE_MODULE_VERSION 48. This version of Node.js requires NODE_MODULE_VERSION 57. means that the version of Node.js used to install the packages and run the packages are different.

This shouldn’t happen when using the prebuilt Arkime releases. If it does, then double check that /data/moloch/bin/node is being used to run viewer.

If you built Arkime yourself, this usually happens if you have a different version of node in your path. You will need to rebuild Arkime and either:


Parliament

Sample Apache Config

Parliament is designed to run behind a reverse proxy such as Apache. Basically, you just need to tell Apache to send all root requests and any /parliament requests to the Parliament server.

ProxyPassMatch   ^/$ http://localhost:8008/parliament retry=0
ProxyPass        /parliament/ http://localhost:8008/parliament/ retry=0


WISE

WISE is not working

Here is the common check list:

  1. Check that WISE is running
    curl http://localhost:8081/fields
    You should see a list of fields that wise knows about.
  2. Check in your config.ini file you've added
    • wise.so to the plugins line.
    • wise.js to the viewerPlugins line
    • wiseURL has been set, or the older wiseHost and wisePort
  3. Check that from the capture/viewer hosts you can reach the viewer hosts and there are no ACL issues.
    curl http://WISEHOST:8081/fields
  4. Restart moloch-capture after adding a --debug option may print out useful information what is wrong. Look to make sure that wise is being called with the correct URL. Verify that the plugins,wiseHost,wiseURL setting is what you actually think it is.

arkime.com

How can I contribute?

Want to add or edit this FAQ? Found an issue in this site? This site's code is open source. Please contribute!

Arkime Logo