Python
Starting with version 6, Arkime now support Python scripting for custom processing of packets and sessions.
This allows you to write custom classifiers and parsers in Python.
The Python support in Arkime requires Python 3.12 or newer, so it may not be available with older linux distributions.
Use the setting disablePython=true to disable Python support in Arkime.
Currently Python support is unavailable in the AL2023, Ubuntu 22, and Debian 12 Arkime packages.
Python Arkime Module
The Python Arkime module has high level methods to register callbacks for packet processing.
Constants
- VERSION : String - The Arkime version as a string
- CONFIG_PREFIX : String - The Arkime install prefix, usually /opt/arkime
- API_VERSION : Integer - The Arkime API version from arkime.h
PortKind - The PortKind for register_port_classifier. Bitwise OR the values together to match multiple ports.
- PORT_UDP_SRC : Integer - Match UDP source port
- PORT_UDP_DST : Integer - Match UDP destination port
- PORT_TCP_SRC : Integer - Match TCP source port
- PORT_TCP_DST : Integer - Match TCP destination port
- PORT_SCTP_SRC : Integer - Match SCTP source port
- PORT_SCTP_DST : Integer - Match SCTP destination port
Callbacks
classifyCb(session, packetBytes, packetLen, direction)
Classifier callback for identifying protocols. Called for the first packet in each direction that matches a registered tcp/udp/sctp/port classifier. The callback should look at the bytes and see if it understands the protocol. If it does it will usually call arkime_session.add_protocol() and/or arkime_session.register_parser().
- session: Opaque session handle, use with arkime_session module methods.
- packetBytes: Read-only memoryview of packet bytes; only valid during callback.
- packetLen: Length of the packet in bytes.
- which: For TCP/UDP: 0 = client to server, 1 = server to client. For SCTP: direction in bit 0, stream ID in upper bits (use which & 1 for direction).
long parserCb(session, packetBytes, packetLen, direction)
Parser callback for protocol dissection. Called for every packet of a session in each direction after being registered with arkime_session.register_parser().
- session: Opaque session handle, use with arkime_session module methods.
- packetBytes: Read-only memoryview of packet bytes; only valid during callback.
- packetLen: Length of the packet in bytes.
- which: For TCP/UDP: 0 = client to server, 1 = server to client. For SCTP: direction in bit 0, stream ID in upper bits (use which & 1 for direction).
- Returns: -1: Unregister parser (no more callbacks for this session)
- 0: Normal case, continue receiving packets
0: Number of bytes consumed (used when this protocol wraps others)
saveCb(session, final)
Session save callback. Used for both pre_save and save callbacks.
- session: Opaque session handle, use with arkime_session module methods.
- final: Non-zero if this is the final save for this session, 0 if more linked sessions follow.
Methods
fieldPos field_define(fieldExpression, fieldDefinition)
Create a new field that can be used in sessions. Must be called at startup, not from callbacks.
- fieldExpression: The expression used in viewer to access the field (e.g., “myproto.field”).
- fieldDefinition: The field definition in custom-fields format.
Format: “db:
;kind: ;friendly: ;count: ;help: " Example: "db:myproto.field;kind:termfield;friendly:My Field;count:true;help:Description" Types: termfield, integer, ip, lotermfield, uptermfield, seconds, textfield
Returns int: The field position for use with add_string/add_int (faster than using expression string).
fieldPos field_get(fieldExpression)
Retrieve the field position for a previously defined field expression.
- fieldExpression: The expression used in viewer to access the field (e.g., “myproto.field”).
Returns int: The field position, or -1 if the field does not exist.
register_port_classifier(name, port, portKind, classifyCb)
Register a classifier that matches on a specific port and protocol type. This usually isn’t recommended since most protocols can run on any port.
- name: The short name of the classifier, used internally to identify the classifier.
- port: The IP port to match on.
- portKind: Bitwise OR the values from the PortKind constants to match on.
- classifyCb: The callback to call when the classifier matches.
register_pre_save(saveCb)
Register a callback to be called before a session is saved to the database. Called before housekeeping such as running save rules, so fields added here can trigger rules.
- saveCb: The callback function with signature (session, final) -> None.
register_save(saveCb)
Register a callback to be called when a session is being saved to the database. This is the final opportunity to add fields or tags before the session is written.
- saveCb: The callback function with signature (session, final) -> None.
register_sctp_classifier(name, matchOffset, matchBytes, classifyCb)
Register a SCTP classifier that will call the classifyCb callback for the first packet of a session in each direction that matches the matchBytes starting at the matchOffset.
- name: The short name of the classifier, used internally to identify the classifier.
- matchOffset: The byte offset in the packet where the matchBytes should be found.
- matchBytes: The bytes to match in the packet.
- classifyCb: The callback to call when the classifier matches. The which field will contain the direction AND sctp stream id. Arkime will send full messages to the callback.
register_sctp_protocol_classifier(name, protocol, classifyCb)
Register a SCTP protocol classifier that will call the classifyCb callback for the first packet of a session in each direction that matches the protocolId in the SCTP header.
- name: The short name of the classifier, used internally to identify the classifier.
- protocol: The protocol id in the SCTP header to match.
- classifyCb: The callback to call when the classifier matches. The which field will contain the direction AND sctp stream id. Arkime will send full messages to the callback.
register_tcp_classifier(name, matchOffset, matchBytes, classifyCb)
Register a TCP classifier that will call the classifyCb callback for the first packet of a session in each direction that matches the matchBytes starting at the matchOffset.
- name: The short name of the classifier, used internally to identify the classifier.
- matchOffset: The byte offset in the packet where the matchBytes should be found.
- matchBytes: The bytes to match in the packet.
- classifyCb: The callback to call when the classifier matches.
register_udp_classifier(name, matchOffset, matchBytes, classifyCb)
Register a UDP classifier that will call the classifyCb callback for the first packet of a session in each direction that matches the matchBytes starting at the matchOffset.
- name: The short name of the classifier, used internally to identify the classifier.
- matchOffset: The byte offset in the packet where the matchBytes should be found.
- matchBytes: The bytes to match in the packet.
- classifyCb: The callback to call when the classifier matches.
Variables
CONFIG_PREFIX
The Arkime install prefix, usually /opt/arkime
VERSION
The Arkime version as a string
Python Arkime Session Module
The Python Arkime Session module has methods for dealing with sessions. The API is very unpythonic and treats the session as a opaque object that needs to be passed around.
Methods
add_int(session, fieldPosOrExp, value)
Add an integer value to a session field.
- session: The session object from the classifyCb or parserCb.
- fieldPosOrExp: The field position returned by field_define/field_get or the field expression.
- value: The integer value to add to the session field.
Returns bool: True if the value was added, False if it was a duplicate or field doesn’t exist.
add_protocol(session, protocol)
Optimized version of add_string(session, ‘protocol’, protocol).
- session: The session object from the classifyCb or parserCb.
- protocol: The protocol string to add to the session.
add_string(session, fieldPosOrExp, value)
Add a string value to a session field.
- session: The session object from the classifyCb or parserCb.
- fieldPosOrExp: The field position returned by field_define/field_get or the field expression.
- value: The string value to add to the session field.
Returns bool: True if the value was added, False if it was a duplicate or field doesn’t exist.
add_tag(session, tag)
Optimized version of add_string(session, ‘tags’, tag).
- session: The session object from the classifyCb or parserCb.
- tag: The tag string to add to the session.
decref(session)
Decrement the reference count of a session. Call after incref when done with async operations. The session may be freed when the reference count reaches zero.
- session: The session object previously passed to incref.
get(session, fieldPosOrExp)
Retrieve the value of a session field.
- session: The session object from the classifyCb or parserCb.
- fieldPosOrExp: The field position returned by field_define/field_get or the field expression. The field value. Returns a list for multi-value fields, a single value for single-value fields, or None if the field is not set.
get_attr(session, key)
Retrieve a Python object previously associated with the session via set_attr.
- session: The session object from the classifyCb or parserCb.
- key: The attribute key used in set_attr. The stored Python object, or None if the key does not exist.
has_protocol(session, protocol)
Check if a protocol has been added to the session.
- session: The session object from the classifyCb or parserCb.
- protocol: The protocol string to check for.
Returns bool: True if the protocol is present, False otherwise.
incref(session)
Increment the reference count of a session. Use when storing a session handle for later use outside of the callback (e.g., async operations). Must call decref when done.
- session: The session object from the classifyCb or parserCb.
register_parser(session, parserCb)
Register a parser callback for every packet of the session.
- session: The session object from the classifyCb or parserCb.
- parserCb: The callback to call for every packet of the session in each direction.
set_attr(session, key, value)
Associate a Python object with the session. This is useful for storing state between calls to the parserCb. The key is global across all Python modules, so use a unique key to avoid collisions.
- session: The session object from the classifyCb or parserCb.
- key: The attribute key.
- value: The Python object to store.
Python Arkime Packet Module
The Python Arkime Packet module has methods for dealing with packets before they are associated with sessions. The API is very unpythonic and treats the packet as a opaque object that needs to be passed around.
Constants
PacketRC
The return values for a packetCb callback.
- DO_PROCESS : Integer - Process the packet normally
- CORRUPT : Integer - The packet is corrupt
- UNKNOWN_ETHER : Integer - Unknown Ethernet type encountered
- UNKNOWN_IP : Integer - Unknown IP protocol encountered
- DONT_PROCESS : Integer - The packet should not be processed but can be freed
- DONT_PROCESS_OR_FREE : Integer - The packet should not be processed and should not be freed
Callbacks
PacketRC packetCb(batch, packet, packetBytes, packetLen)
Low-level packet callback for handling custom Ethernet types or IP protocols. Called by reader threads for registered ethertypes/protocols. Usually strips headers and calls arkime_packet.run_ethernet_cb() or arkime_packet.run_ip_cb().
- batch: Opaque batch handle, pass to run_ethernet_cb/run_ip_cb.
- packet: Opaque packet handle, use with arkime_packet.get()/set().
- packetBytes: Read-only memoryview of packet bytes; only valid during callback.
- packetLen: Length of the packet in bytes.
- Returns:
- PacketRC: Return result from run_*_cb(), or DO_PROCESS/CORRUPT/DONT_PROCESS/etc.
Methods
get(packet, field)
Retrieve the value of a packet field.
- packet: The packet object from the packetCb.
- field: The string field name to retrieve.
- copied - Integer - 0 = not copied, 1 = copied
- direction - Integer - 0 = client to server, 1 = server to client
- etherOffset - Integer - Offset of ethernet header in packet
- ipOffset - Integer - Offset of IP header in packet
- ipProtocol - Integer - IP protocol number (6=TCP, 17=UDP, 132=SCTP, etc.)
- mProtocol - Integer - The Arkime mProtocol number
- outerEtherOffset - Integer - Offset of outer ethernet header for tunneled packets
- outerIpOffset - Integer - Offset of outer IP header for tunneled packets
- outerv6 - Integer - 1 if outer IP is IPv6, 0 if IPv4
- payloadLen - Integer - Length of the payload
- payloadOffset - Integer - Offset of the payload in packet
- pktlen - Integer - The full packet length
- readerFilePos - Integer - The file position of the packet in the pcap file
- readerPos - Integer - Index of the reader internal data
- tunnel - Integer - Bitflags: 0x01=GRE, 0x02=PPPoE, 0x04=MPLS, 0x08=PPP, 0x10=GTP, 0x20=VXLAN, 0x40=VXLAN-GPE, 0x80=Geneve
- v6 - Integer - 1 if IP is IPv6, 0 if IPv4
- vlan - Integer - The first VLAN tag if present, 0 if not present
- vni - Integer - The VXLAN VNI if present, 0 if not present
- wasfrag - Integer - 1 if the packet was a fragment, 0 if not
- writerFileNum - Integer - The writer file number from files index
- writerFilePos - Integer - The offset in the writer file
run_ethernet_cb(batch, packet, packetBytes, type, description)
Continue processing a packet at the Ethernet layer. Calls the registered callback for the ethertype.
- batch: The opaque batch object from the packetCb.
- packet: The opaque packet object from the packetCb.
- packetBytes: The memoryview of packet bytes starting at the new ethernet header.
- type: The Ethertype of the inner packet (e.g., 0x0800=IPv4, 0x86DD=IPv6).
- description: A short description for logging/debugging (e.g., “myproto”).
Returns PacketRC: The result from the ethertype handler, or UNKNOWN_ETHER if no handler registered.
run_ip_cb(batch, packet, packetBytes, type, description)
Continue processing a packet at the IP layer. Calls the registered callback for the IP protocol.
- batch: The opaque batch object from the packetCb.
- packet: The opaque packet object from the packetCb.
- packetBytes: The memoryview of packet bytes starting at the IP header.
- type: The IP protocol number (e.g., 6=TCP, 17=UDP, 132=SCTP).
- description: A short description for logging/debugging (e.g., “myproto”).
Returns PacketRC: The result from the IP protocol handler, or UNKNOWN_IP if no handler registered.
set(packet, field, value)
Set the value of a packet field. Only certain fields can be set.
- packet: The packet object from the packetCb.
- field: The string field name to set.
- etherOffset - Integer - Offset of ethernet header in packet
- mProtocol - Integer - The Arkime mProtocol number
- outerEtherOffset - Integer - Offset of outer ethernet header for tunneled packets
- outerIpOffset - Integer - Offset of outer IP header for tunneled packets
- outerv6 - Integer - 1 if outer IP is IPv6, 0 if IPv4
- payloadLen - Integer - Length of the payload
- payloadOffset - Integer - Offset of the payload in packet
- tunnel - Integer - Bitflags: 0x01=GRE, 0x02=PPPoE, 0x04=MPLS, 0x08=PPP, 0x10=GTP, 0x20=VXLAN, 0x40=VXLAN-GPE, 0x80=Geneve
- v6 - Integer - 1 if IP is IPv6, 0 if IPv4
- vlan - Integer - The first VLAN tag if present, 0 if not present
- vni - Integer - The VXLAN VNI if present, 0 if not present
- value: The integer value to set.
set_ethernet_cb(type, packetCb)
Register an ethertype packet callback. Called for packets of the given ethertype. Typically the callback strips headers and calls run_ip_cb or run_ethernet_cb.
- type: The Ethertype to register for (e.g., 0x0800=IPv4, 0x86DD=IPv6, 0x8100=VLAN).
- packetCb: The callback to call for packets of the given ethertype.
set_ip_cb(type, ipCb)
Register an IP protocol packet callback. Called for packets of the given IP protocol.
- type: The IP protocol number to register for (e.g., 6=TCP, 17=UDP, 47=GRE, 132=SCTP).
- ipCb: The callback to call for packets of the given protocol.
Example
Create a /opt/arkime/parsers/example.py file with the following content:
import arkime
import arkime_session
import arkime_packet
import sys
def my_parsers_cb(session, bytes, len, which):
# Write code here to parse the bytes and extract information
print("PARSER:", arkime_session.get(session, "ip.src"), ":", arkime_session.get(session, "port.src"), "->", arkime_session.get(session, "ip.dst"), ":", arkime_session.get(session, "port.dst"), "len", len, "which", which)
# then you could set a field
# arkime_session.add_string(session, pos, "my value")
# A parser should return -1 to unregister itself, 0 to continue parsing
return 0
def my_classify_callback(session, bytes, len, which):
print("CLASSIFY:", arkime_session.get(session, "ip.src"), ":", arkime_session.get(session, "port.src"), "->", arkime_session.get(session, "ip.dst"), ":", arkime_session.get(session, "port.dst"), "len", len, "which", which)
# Example of adding a tag
arkime_session.add_tag(session, "python")
# Do some kind of check to see if you want to classify this session, if so register
arkime_session.register_parser(session, my_parsers_cb)
def my_pre_save_callback(session, final):
print("PRE SAVE:", arkime_session.get(session, "ip.src"), ":", arkime_session.get(session, "port.src"), "->", arkime_session.get(session, "ip.dst"), ":", arkime_session.get(session, "port.dst"), "final", final)
def my_save_callback(session, final):
print("SAVE:", arkime_session.get(session, "ip.src"), ":", arkime_session.get(session, "port.src"), "->", arkime_session.get(session, "ip.dst"), ":", arkime_session.get(session, "port.dst"), "final", final)
def my_ethernet_cb(batch, packet, bytes, len):
print("ETHERNET:", "batch", batch, "packet", "packet", "bytes", bytes, "len", len, "pktlen", arkime_packet.get(packet, "pktlen"))
# Remove first 18 bytes of ethernet header and run ethernet callback again
bytes = bytes[18:]
return arkime_packet.run_ethernet_cb(batch, packet, bytes, 0, "example")
### Start ###
# Register a classifier. This example will match all TCP sessions
arkime.register_tcp_classifier("test", 0, bytes("", "ascii"), my_classify_callback)
arkime.register_pre_save(my_pre_save_callback)
arkime.register_save(my_save_callback)
arkime_packet.set_ethernet_cb(0xff12, my_ethernet_cb)
# Create a new field in the session we will be setting
pos = arkime.field_define("arkime_rulz", "kind:lotermfield;db:arkime_rulz")
print("VERSION", arkime.VERSION, "CONFIG_PREFIX", arkime.CONFIG_PREFIX, "POS", pos)