STIX 2 Python API Documentation¶
Welcome to the STIX 2 Python API’s documentation. This library is designed to help you work with STIX 2 content. For more information about STIX 2, see the website of the OASIS Cyber Threat Intelligence Technical Committee.
Get started with an overview of the library, then take a look at the guides and tutorials to see how to use it. For information about a specific class or function, see the API reference.
Overview¶
Goals¶
High level goals/principles of the Python stix2
library:
- It should be as easy as possible (but no easier!) to perform common tasks of producing, consuming, and processing STIX 2 content.
- It should be hard, if not impossible, to emit invalid STIX 2.
- The library should default to doing “the right thing”, complying with both the STIX 2.0 spec, as well as associated best practices. The library should make it hard to do “the wrong thing”.
Design Decisions¶
To accomplish these goals, and to incorporate lessons learned while developing
python-stix
(for STIX 1.x), several decisions influenced the design of the
stix2
library:
- All data structures are immutable by default. In contrast to python-stix,
where users would create an object and then assign attributes to it, in
stix2
all properties must be provided when creating the object. - Where necessary, library objects should act like
dict
’s. When treated as astr
, the JSON reprentation of the object should be used. - Core Python data types (including numeric types,
datetime
) should be used when appropriate, and serialized to the correct format in JSON as specified in the STIX 2 spec.
Architecture¶
The stix2
library is divided into three logical layers, representing
different levels of abstraction useful in different types of scripts and larger
applications. It is possible to combine multiple layers in the same program,
and the higher levels build on the layers below.
Object Layer¶
The lowest layer, the Object Layer, is where Python objects representing STIX 2 data types (such as SDOs, SROs, and Cyber Observable Objects, as well as non-top-level objects like External References, Kill Chain phases, and Cyber Observable extensions) are created, and can be serialized and deserialized to and from JSON representation.
This layer is appropriate for stand-alone scripts that produce or consume STIX 2 content, or can serve as a low-level data API for larger applications that need to represent STIX objects as Python classes.
At this level, non-embedded reference properties (those ending in _ref
, such
as the links from a Relationship object to its source and target objects) are
not implemented as references between the Python objects themselves, but by
simply having the same values in id
and reference properties. There is no
referential integrity maintained by the stix2
library.
Environment Layer¶
The Environment Layer adds several components that make it easier to handle STIX 2 data as part of a larger application and as part of a larger cyber threat intelligence ecosystem.
Data Source
s represent locations from which STIX data can be retrieved, such as a TAXII server, database, or local filesystem. The Data Source API abstracts differences between these storage location, giving a common API to get objects by ID or query by various properties, as well as allowing federated operations over multiple data sources.- Similarly,
Data Sink
objects represent destinations for sending STIX data. - An
Object Factory
provides a way to add common properties to all created objects (such as the samecreated_by_ref
, or aStatementMarking
with copyright information or terms of use for the STIX data).
Each of these components can be used individually, or combined as part of an
Environment
. These Environment
objects allow different settings to be
used by different users of a multi-user application (such as a web application).
For more information, check out this Environment tutorial.
Workbench Layer¶
The highest layer of the stix2
APIs is the Workbench Layer, designed for
a single user in a highly-interactive analytical environment (such as a Jupyter
Notebook). It builds on the lower layers of the API,
while hiding most of their complexity. Unlike the other layers, this layer is
designed to be used directly by end users. For users who are comfortable with
Python, the Workbench Layer makes it easy to quickly interact with STIX data
from a variety of sources without needing to write and run one-off Python
scripts. For more information, check out this Workbench tutorial.
User’s Guide¶
This section of documentation contains guides and tutorials on how to use the
stix2
library.
Creating STIX Content¶
Creating STIX Domain Objects¶
To create a STIX object, provide keyword arguments to the type’s constructor:
[3]:
from stix2 import Indicator
indicator = Indicator(name="File hash for malware variant",
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
print(indicator)
[3]:
{
"type": "indicator",
"id": "indicator--2f3d4926-163d-4aef-bcd2-19dea96916ae",
"created": "2019-05-13T13:14:48.509Z",
"modified": "2019-05-13T13:14:48.509Z",
"name": "File hash for malware variant",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-05-13T13:14:48.509629Z",
"labels": [
"malicious-activity"
]
}
Certain required attributes of all objects will be set automatically if not provided as keyword arguments:
- If not provided,
type
will be set automatically to the correct type. You can also provide the type explicitly, but this is not necessary:
[4]:
indicator2 = Indicator(type='indicator',
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
Passing a value for type
that does not match the class being constructed will cause an error:
[5]:
indicator3 = Indicator(type='xxx',
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
InvalidValueError: Invalid value for Indicator 'type': must equal 'indicator'.
- If not provided,
id
will be generated randomly. If you provide anid
argument, it must begin with the correct prefix:
[6]:
indicator4 = Indicator(id="campaign--63ce9068-b5ab-47fa-a2cf-a602ea01f21a",
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
InvalidValueError: Invalid value for Indicator 'id': must start with 'indicator--'.
For indicators, labels
and pattern
are required and cannot be set automatically. Trying to create an indicator that is missing one of these properties will result in an error:
[7]:
indicator = Indicator()
MissingPropertiesError: No values for required properties for Indicator: (labels, pattern).
However, the required valid_from
attribute on Indicators will be set to the current time if not provided as a keyword argument.
Once created, the object acts like a frozen dictionary. Properties can be accessed using the standard Python dictionary syntax:
[8]:
indicator['name']
[8]:
'File hash for malware variant'
Or access properties using the standard Python attribute syntax:
[9]:
indicator.name
[9]:
'File hash for malware variant'
Warning
Note that there are several attributes on these objects used for method names. Accessing those will return a bound method, not the attribute value.
Attempting to modify any attributes will raise an error:
[10]:
indicator['name'] = "This is a revised name"
TypeError: 'Indicator' object does not support item assignment
[11]:
indicator.name = "This is a revised name"
ImmutableError: Cannot modify 'name' property in 'Indicator' after creation.
To update the properties of an object, see the Versioning section.
Creating a Malware object follows the same pattern:
[12]:
from stix2 import Malware
malware = Malware(name="Poison Ivy",
labels=['remote-access-trojan'])
print(malware)
[12]:
{
"type": "malware",
"id": "malware--1f2aba70-f0ae-49cd-9267-6fcb1e43be67",
"created": "2019-05-13T13:15:04.698Z",
"modified": "2019-05-13T13:15:04.698Z",
"name": "Poison Ivy",
"labels": [
"remote-access-trojan"
]
}
As with indicators, the type
, id
, created
, and modified
properties will be set automatically if not provided. For Malware objects, the labels
and name
properties must be provided.
You can see the full list of SDO classes here.
Creating Relationships¶
STIX 2 Relationships are separate objects, not properties of the object on either side of the relationship. They are constructed similarly to other STIX objects. The type
, id
, created
, and modified
properties are added automatically if not provided. Callers must provide the relationship_type
, source_ref
, and target_ref
properties.
[13]:
from stix2 import Relationship
relationship = Relationship(relationship_type='indicates',
source_ref=indicator.id,
target_ref=malware.id)
print(relationship)
[13]:
{
"type": "relationship",
"id": "relationship--80c174fa-36d1-47c2-9a9d-ce0c636bedcc",
"created": "2019-05-13T13:15:13.152Z",
"modified": "2019-05-13T13:15:13.152Z",
"relationship_type": "indicates",
"source_ref": "indicator--2f3d4926-163d-4aef-bcd2-19dea96916ae",
"target_ref": "malware--1f2aba70-f0ae-49cd-9267-6fcb1e43be67"
}
The source_ref
and target_ref
properties can be either the ID’s of other STIX objects, or the STIX objects themselves. For readability, Relationship objects can also be constructed with the source_ref
, relationship_type
, and target_ref
as positional (non-keyword) arguments:
[14]:
relationship2 = Relationship(indicator, 'indicates', malware)
print(relationship2)
[14]:
{
"type": "relationship",
"id": "relationship--47395d23-dedd-45d4-8db1-c9ffaf44493d",
"created": "2019-05-13T13:15:16.566Z",
"modified": "2019-05-13T13:15:16.566Z",
"relationship_type": "indicates",
"source_ref": "indicator--2f3d4926-163d-4aef-bcd2-19dea96916ae",
"target_ref": "malware--1f2aba70-f0ae-49cd-9267-6fcb1e43be67"
}
Creating Bundles¶
STIX Bundles can be created by passing objects as arguments to the Bundle constructor. All required properties (type
, id
, and spec_version
) will be set automatically if not provided, or can be provided as keyword arguments:
[15]:
from stix2 import Bundle
bundle = Bundle(indicator, malware, relationship)
print(bundle)
[15]:
{
"type": "bundle",
"id": "bundle--388c9b2c-936c-420a-baa5-04f48d682a01",
"spec_version": "2.0",
"objects": [
{
"type": "indicator",
"id": "indicator--2f3d4926-163d-4aef-bcd2-19dea96916ae",
"created": "2019-05-13T13:14:48.509Z",
"modified": "2019-05-13T13:14:48.509Z",
"name": "File hash for malware variant",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-05-13T13:14:48.509629Z",
"labels": [
"malicious-activity"
]
},
{
"type": "malware",
"id": "malware--1f2aba70-f0ae-49cd-9267-6fcb1e43be67",
"created": "2019-05-13T13:15:04.698Z",
"modified": "2019-05-13T13:15:04.698Z",
"name": "Poison Ivy",
"labels": [
"remote-access-trojan"
]
},
{
"type": "relationship",
"id": "relationship--80c174fa-36d1-47c2-9a9d-ce0c636bedcc",
"created": "2019-05-13T13:15:13.152Z",
"modified": "2019-05-13T13:15:13.152Z",
"relationship_type": "indicates",
"source_ref": "indicator--2f3d4926-163d-4aef-bcd2-19dea96916ae",
"target_ref": "malware--1f2aba70-f0ae-49cd-9267-6fcb1e43be67"
}
]
}
Creating Cyber Observable References¶
Cyber Observable Objects have properties that can reference other Cyber Observable Objects. In order to create those references, use the _valid_refs
property as shown in the following examples. It should be noted that _valid_refs
is necessary when creating references to Cyber Observable Objects since some embedded references can only point to certain types, and _valid_refs
helps ensure consistency.
There are two cases.
Case 1: Specifying the type of the Cyber Observable Objects being referenced¶
In the following example, the IPv4Address object has its resolves_to_refs
property specified. As per the spec, this property’s value must be a list of reference(s) to MACAddress objects. In this case, those references are strings that state the type of the Cyber Observable Object being referenced, and are provided in _valid_refs
.
[16]:
from stix2 import IPv4Address
ip4 = IPv4Address(
_valid_refs={"1": "mac-addr", "2": "mac-addr"},
value="177.60.40.7",
resolves_to_refs=["1", "2"]
)
print(ip4)
[16]:
{
"type": "ipv4-addr",
"value": "177.60.40.7",
"resolves_to_refs": [
"1",
"2"
]
}
Case 2: Specifying the name of the Cyber Observable Objects being referenced¶
The following example is just like the one provided in Case 1 above, with one key difference: instead of using strings to specify the type of the Cyber Observable Objects being referenced in _valid_refs
, the referenced Cyber Observable Objects are created beforehand and then their names are provided in _valid_refs
.
[17]:
from stix2 import MACAddress
mac_addr_a = MACAddress(value="a1:b2:c3:d4:e5:f6")
mac_addr_b = MACAddress(value="a7:b8:c9:d0:e1:f2")
ip4_valid_refs = IPv4Address(
_valid_refs={"1": mac_addr_a, "2": mac_addr_b},
value="177.60.40.7",
resolves_to_refs=["1", "2"]
)
print(ip4_valid_refs)
[17]:
{
"type": "ipv4-addr",
"value": "177.60.40.7",
"resolves_to_refs": [
"1",
"2"
]
}
Custom STIX Content¶
Custom Properties¶
Attempting to create a STIX object with properties not defined by the specification will result in an error. Try creating an Identity
object with a custom x_foo
property:
[3]:
from stix2 import Identity
Identity(name="John Smith",
identity_class="individual",
x_foo="bar")
ExtraPropertiesError: Unexpected properties for Identity: (x_foo).
To create a STIX object with one or more custom properties, pass them in as a dictionary parameter called custom_properties
:
[4]:
identity = Identity(name="John Smith",
identity_class="individual",
custom_properties={
"x_foo": "bar"
})
print(identity)
[4]:
{
"type": "identity",
"id": "identity--d6996982-5fb7-4364-b716-b618516989b6",
"created": "2020-03-05T05:06:27.349Z",
"modified": "2020-03-05T05:06:27.349Z",
"name": "John Smith",
"identity_class": "individual",
"x_foo": "bar"
}
Alternatively, setting allow_custom
to True
will allow custom properties without requiring a custom_properties
dictionary.
[5]:
identity2 = Identity(name="John Smith",
identity_class="individual",
x_foo="bar",
allow_custom=True)
print(identity2)
[5]:
{
"type": "identity",
"id": "identity--a167d2de-9fc4-4734-a1ae-57a548aad22a",
"created": "2020-03-05T05:06:29.180Z",
"modified": "2020-03-05T05:06:29.180Z",
"name": "John Smith",
"identity_class": "individual",
"x_foo": "bar"
}
Likewise, when parsing STIX content with custom properties, pass allow_custom=True
to parse():
[6]:
from stix2 import parse
input_string = """{
"type": "identity",
"id": "identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
"created": "2015-12-21T19:59:11Z",
"modified": "2015-12-21T19:59:11Z",
"name": "John Smith",
"identity_class": "individual",
"x_foo": "bar"
}"""
identity3 = parse(input_string, allow_custom=True)
print(identity3.x_foo)
[6]:
bar
To remove a custom properties, use new_version()
and set it to None
.
[7]:
identity4 = identity3.new_version(x_foo=None)
print(identity4)
[7]:
{
"type": "identity",
"id": "identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
"created": "2015-12-21T19:59:11.000Z",
"modified": "2020-03-05T05:06:32.934Z",
"name": "John Smith",
"identity_class": "individual"
}
Custom STIX Object Types¶
To create a custom STIX object type, define a class with the @CustomObject decorator. It takes the type name and a list of property tuples, each tuple consisting of the property name and a property instance. Any special validation of the properties can be added by supplying an __init__
function.
Let’s say zoo animals have become a serious cyber threat and we want to model them in STIX using a custom object type. Let’s use a species
property to store the kind of animal, and make that property required. We also want a property to store the class of animal, such as “mammal” or “bird” but only want to allow specific values in it. We can add some logic to validate this property in __init__
.
[8]:
from stix2 import CustomObject, properties
@CustomObject('x-animal', [
('species', properties.StringProperty(required=True)),
('animal_class', properties.StringProperty()),
])
class Animal(object):
def __init__(self, animal_class=None, **kwargs):
if animal_class and animal_class not in ['mammal', 'bird', 'fish', 'reptile']:
raise ValueError("'%s' is not a recognized class of animal." % animal_class)
Now we can create an instance of our custom Animal
type.
[9]:
animal = Animal(species="lion",
animal_class="mammal")
print(animal)
[9]:
{
"type": "x-animal",
"id": "x-animal--1f7ce0ad-fd3a-4cf0-9cd7-13f7bef9ecd4",
"created": "2020-03-05T05:06:38.010Z",
"modified": "2020-03-05T05:06:38.010Z",
"species": "lion",
"animal_class": "mammal"
}
Trying to create an Animal
instance with an animal_class
that’s not in the list will result in an error:
[10]:
Animal(species="xenomorph",
animal_class="alien")
ValueError: 'alien' is not a recognized class of animal.
Parsing custom object types that you have already defined is simple and no different from parsing any other STIX object.
[11]:
input_string2 = """{
"type": "x-animal",
"id": "x-animal--941f1471-6815-456b-89b8-7051ddf13e4b",
"created": "2015-12-21T19:59:11Z",
"modified": "2015-12-21T19:59:11Z",
"species": "shark",
"animal_class": "fish"
}"""
animal2 = parse(input_string2)
print(animal2.species)
[11]:
shark
However, parsing custom object types which you have not defined will result in an error:
[12]:
input_string3 = """{
"type": "x-foobar",
"id": "x-foobar--d362beb5-a04e-4e6b-a030-b6935122c3f9",
"created": "2015-12-21T19:59:11Z",
"modified": "2015-12-21T19:59:11Z",
"bar": 1,
"baz": "frob"
}"""
parse(input_string3)
ParseError: Can't parse unknown object type 'x-foobar'! For custom types, use the CustomObject decorator.
Custom Cyber Observable Types¶
Similar to custom STIX object types, use a decorator to create custom Cyber Observable types. Just as before, __init__()
can hold additional validation, but it is not necessary.
[13]:
from stix2 import CustomObservable
@CustomObservable('x-new-observable', [
('a_property', properties.StringProperty(required=True)),
('property_2', properties.IntegerProperty()),
])
class NewObservable():
pass
new_observable = NewObservable(a_property="something",
property_2=10)
print(new_observable)
[13]:
{
"type": "x-new-observable",
"a_property": "something",
"property_2": 10
}
Likewise, after the custom Cyber Observable type has been defined, it can be parsed.
[14]:
from stix2 import ObservedData
input_string4 = """{
"type": "observed-data",
"id": "observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf",
"created_by_ref": "identity--f431f809-377b-45e0-aa1c-6a4751cae5ff",
"created": "2016-04-06T19:58:16.000Z",
"modified": "2016-04-06T19:58:16.000Z",
"first_observed": "2015-12-21T19:00:00Z",
"last_observed": "2015-12-21T19:00:00Z",
"number_observed": 50,
"objects": {
"0": {
"type": "x-new-observable",
"a_property": "foobaz",
"property_2": 5
}
}
}"""
obs_data = parse(input_string4)
print(obs_data.objects["0"].a_property)
print(obs_data.objects["0"].property_2)
[14]:
foobaz
[14]:
5
ID-Contributing Properties for Custom Cyber Observables¶
STIX 2.1 Cyber Observables (SCOs) have deterministic IDs, meaning that the ID of a SCO is based on the values of some of its properties. Thus, if multiple cyber observables of the same type have the same values for their ID-contributing properties, then these SCOs will have the same ID. UUIDv5 is used for the deterministic IDs, using the namespace "00abedb4-aa42-466c-9c01-fed23315a9b7"
. A SCO’s ID-contributing properties may consist of a combination of required properties and optional
properties.
If a SCO type does not have any ID contributing properties defined, or all of the ID-contributing properties are not present on the object, then the SCO uses a randomly-generated UUIDv4. Thus, you can optionally define which of your custom SCO’s properties should be ID-contributing properties. Similar to standard SCOs, your custom SCO’s ID-contributing properties can be any combination of the SCO’s required and optional properties.
You define the ID-contributing properties when defining your custom SCO with the CustomObservable
decorator. After the list of properties, you can optionally define the list of id-contributing properties. If you do not want to specify any id-contributing properties for your custom SCO, then you do not need to do anything additional.
See the example below:
[15]:
from stix2.v21 import CustomObservable # IDs and Deterministic IDs are NOT part of STIX 2.0 Custom Observables
@CustomObservable('x-new-observable-2', [
('a_property', properties.StringProperty(required=True)),
('property_2', properties.IntegerProperty()),
], [
'a_property'
])
class NewObservable2():
pass
new_observable_a = NewObservable2(a_property="A property", property_2=2000)
print(new_observable_a)
new_observable_b = NewObservable2(a_property="A property", property_2=3000)
print(new_observable_b)
new_observable_c = NewObservable2(a_property="A different property", property_2=3000)
print(new_observable_c)
[15]:
{
"type": "x-new-observable-2",
"id": "x-new-observable-2--6bc655d6-dcb8-52a3-a862-46848c17e599",
"a_property": "A property",
"property_2": 2000
}
[15]:
{
"type": "x-new-observable-2",
"id": "x-new-observable-2--6bc655d6-dcb8-52a3-a862-46848c17e599",
"a_property": "A property",
"property_2": 3000
}
[15]:
{
"type": "x-new-observable-2",
"id": "x-new-observable-2--1e56f9c3-a73b-5fbd-b348-83c76523c4df",
"a_property": "A different property",
"property_2": 3000
}
In this example, a_property
is the only id-contributing property. Notice that the ID for new_observable_a
and new_observable_b
is the same since they have the same value for the id-contributing a_property
property.
Custom Cyber Observable Extensions¶
Finally, custom extensions to existing Cyber Observable types can also be created. Just use the @CustomExtension decorator. Note that you must provide the Cyber Observable class to which the extension applies. Again, any extra validation of the properties can be implemented by providing an __init__()
but it is not required. Let’s say we want to make an extension to the File
Cyber Observable Object:
[16]:
from stix2 import File, CustomExtension
@CustomExtension(File, 'x-new-ext', [
('property1', properties.StringProperty(required=True)),
('property2', properties.IntegerProperty()),
])
class NewExtension():
pass
new_ext = NewExtension(property1="something",
property2=10)
print(new_ext)
[16]:
{
"property1": "something",
"property2": 10
}
Once the custom Cyber Observable extension has been defined, it can be parsed.
[17]:
input_string5 = """{
"type": "observed-data",
"id": "observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf",
"created_by_ref": "identity--f431f809-377b-45e0-aa1c-6a4751cae5ff",
"created": "2016-04-06T19:58:16.000Z",
"modified": "2016-04-06T19:58:16.000Z",
"first_observed": "2015-12-21T19:00:00Z",
"last_observed": "2015-12-21T19:00:00Z",
"number_observed": 50,
"objects": {
"0": {
"type": "file",
"name": "foo.bar",
"hashes": {
"SHA-256": "35a01331e9ad96f751278b891b6ea09699806faedfa237d40513d92ad1b7100f"
},
"extensions": {
"x-new-ext": {
"property1": "bla",
"property2": 50
}
}
}
}
}"""
obs_data2 = parse(input_string5)
print(obs_data2.objects["0"].extensions["x-new-ext"].property1)
print(obs_data2.objects["0"].extensions["x-new-ext"].property2)
[17]:
bla
[17]:
50
DataStore API¶
The stix2
library features an interface for pulling and pushing STIX 2 content. This interface consists of DataStore, DataSource and DataSink constructs: a DataSource for pulling STIX 2 content, a
DataSink for pushing STIX 2 content, and a DataStore for both pulling and pushing.
The DataStore, DataSource, DataSink (collectively referred to as the “DataStore suite”) APIs are not referenced directly by a user but are used as base classes, which are then subclassed by real DataStore suites. The stix2
library provides the DataStore suites of FileSystem,
Memory, and TAXII. Users are also encouraged to subclass the base classes and create their own custom DataStore suites.
CompositeDataSource¶
CompositeDataSource is an available controller that can be used as a single interface to a set of defined DataSources. The purpose of this controller is allow for the grouping of DataSources and making get()
/query()
calls to a set of DataSources in one API call.
CompositeDataSources can be used to organize/group DataSources, federate get()
/all_versions()
/query()
calls, and reduce user code.
CompositeDataSource is just a wrapper around a set of defined DataSources (e.g. FileSystemSource) that federates get()
/all_versions()
/query()
calls individually to each of the attached DataSources ,
collects the results from each DataSource and returns them.
Filters can be attached to CompositeDataSources just as they can be done to DataStores and DataSources. When get()
/all_versions()
/query()
calls are made to the CompositeDataSource, it will pass along any query filters from the
call and any of its own filters to the attached DataSources. In addition, those DataSources may have their own attached filters as well. The effect is that all the filters are eventually combined when the get()
/all_versions()
/query()
call is actually executed within a DataSource.
A CompositeDataSource can also be attached to a CompositeDataSource for multiple layers of grouped DataSources.
CompositeDataSource API¶
CompositeDataSource Examples¶
[4]:
from taxii2client import Collection
from stix2 import CompositeDataSource, FileSystemSource, TAXIICollectionSource
# create FileSystemStore
fs = FileSystemSource("/tmp/stix2_source")
# create TAXIICollectionSource
colxn = Collection('http://127.0.0.1:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/')
ts = TAXIICollectionSource(colxn)
# add them both to the CompositeDataSource
cs = CompositeDataSource()
cs.add_data_sources([fs,ts])
# get an object that is only in the filesystem
intrusion_set = cs.get('intrusion-set--f3bdec95-3d62-42d9-a840-29630f6cdc1a')
print(intrusion_set)
# get an object that is only in the TAXII collection
ind = cs.get('indicator--02b90f02-a96a-43ee-88f1-1e87297941f2')
print(ind)
[4]:
{
"type": "intrusion-set",
"id": "intrusion-set--f3bdec95-3d62-42d9-a840-29630f6cdc1a",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
"created": "2017-05-31T21:31:53.197Z",
"modified": "2017-05-31T21:31:53.197Z",
"name": "DragonOK",
"description": "DragonOK is a threat group that has targeted Japanese organizations with phishing emails. Due to overlapping TTPs, including similar custom tools, DragonOK is thought to have a direct or indirect relationship with the threat group Moafee. [[Citation: Operation Quantum Entanglement]][[Citation: Symbiotic APT Groups]] It is known to use a variety of malware, including Sysget/HelloBridge, PlugX, PoisonIvy, FormerFirstRat, NFlog, and NewCT. [[Citation: New DragonOK]]",
"aliases": [
"DragonOK"
],
"external_references": [
{
"source_name": "mitre-attack",
"url": "https://attack.mitre.org/wiki/Group/G0017",
"external_id": "G0017"
},
{
"source_name": "Operation Quantum Entanglement",
"description": "Haq, T., Moran, N., Vashisht, S., Scott, M. (2014, September). OPERATION QUANTUM ENTANGLEMENT. Retrieved November 4, 2015.",
"url": "https://www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/wp-operation-quantum-entanglement.pdf"
},
{
"source_name": "Symbiotic APT Groups",
"description": "Haq, T. (2014, October). An Insight into Symbiotic APT Groups. Retrieved November 4, 2015.",
"url": "https://dl.mandiant.com/EE/library/MIRcon2014/MIRcon%202014%20R&D%20Track%20Insight%20into%20Symbiotic%20APT.pdf"
},
{
"source_name": "New DragonOK",
"description": "Miller-Osborn, J., Grunzweig, J.. (2015, April). Unit 42 Identifies New DragonOK Backdoor Malware Deployed Against Japanese Targets. Retrieved November 4, 2015.",
"url": "http://researchcenter.paloaltonetworks.com/2015/04/unit-42-identifies-new-dragonok-backdoor-malware-deployed-against-japanese-targets/"
}
],
"object_marking_refs": [
"marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
]
}
[4]:
{
"type": "indicator",
"id": "indicator--02b90f02-a96a-43ee-88f1-1e87297941f2",
"created": "2017-11-13T07:00:24.000Z",
"modified": "2017-11-13T07:00:24.000Z",
"name": "Ransomware IP Blocklist",
"description": "IP Blocklist address from abuse.ch",
"pattern": "[ ipv4-addr:value = '91.237.247.24' ]",
"valid_from": "2017-11-13T07:00:24Z",
"labels": [
"malicious-activity",
"Ransomware",
"Botnet",
"C&C"
],
"external_references": [
{
"source_name": "abuse.ch",
"url": "https://ransomwaretracker.abuse.ch/blocklist/"
}
]
}
Filters¶
The stix2
DataStore suites - FileSystem, Memory, and TAXII - all use the Filters module to allow for the querying of STIX content. Filters can be used to explicitly include or exclude results with certain criteria. For example:
- only trust content from a set of object creators
- exclude content from certain (untrusted) object creators
- only include content with a confidence above a certain threshold (once confidence is added to STIX 2)
- only return content that can be shared with external parties (e.g. only content that has TLP:GREEN markings)
Filters can be created and supplied with every call to query()
, and/or attached to a DataStore so that every future query placed to that DataStore is evaluated against the attached filters, supplemented with any further filters supplied with the query call. Attached filters can also be removed from
DataStores.
Filters are very simple, as they consist of a field name, comparison operator and an object property value (i.e. value to compare to). All properties of STIX 2 objects can be filtered on. In addition, TAXII 2 Filtering parameters for fields can also be used in filters.
TAXII2 filter fields:
- added_after
- id
- type
- version
Supported operators:
- =
- !=
- in
>
- <
>=
- <=
- contains
Value types of the property values must be one of these (Python) types:
- bool
- dict
- float
- int
- list
- str
- tuple
Filter Examples¶
[3]:
import sys
from stix2 import Filter
# create filter for STIX objects that have external references to MITRE ATT&CK framework
f = Filter("external_references.source_name", "=", "mitre-attack")
# create filter for STIX objects that are not of SDO type Attack-Pattnern
f1 = Filter("type", "!=", "attack-pattern")
# create filter for STIX objects that have the "threat-report" label
f2 = Filter("labels", "in", "threat-report")
# create filter for STIX objects that have been modified past the timestamp
f3 = Filter("modified", ">=", "2017-01-28T21:33:10.772474Z")
# create filter for STIX objects that have been revoked
f4 = Filter("revoked", "=", True)
For Filters to be applied to a query, they must be either supplied with the query call or attached to a DataStore, more specifically to a DataSource whether that DataSource is a part of a DataStore or stands by itself.
[6]:
from stix2 import MemoryStore, FileSystemStore, FileSystemSource
fs = FileSystemStore("/tmp/stix2_store")
fs_source = FileSystemSource("/tmp/stix2_source")
# attach filter to FileSystemStore
fs.source.filters.add(f)
# attach multiple filters to FileSystemStore
fs.source.filters.add([f1,f2])
# can also attach filters to a Source
# attach multiple filters to FileSystemSource
fs_source.filters.add([f3, f4])
mem = MemoryStore()
# As it is impractical to only use MemorySink or MemorySource,
# attach a filter to a MemoryStore
mem.source.filters.add(f)
# attach multiple filters to a MemoryStore
mem.source.filters.add([f1,f2])
Note: The ``defanged`` property is now always included (implicitly) for STIX 2.1 Cyber Observable Objects (SCOs)
This is important to remember if you are writing a filter that involves checking the objects
property of a STIX 2.1 ObservedData
object. If any of the objects associated with the objects
property are STIX 2.1 SCOs, then your filter must include the defanged
property. For an example, refer to filters[14]
& filters[15]
in stix2/test/v21/test_datastore_filters.py
De-Referencing Relationships¶
Given a STIX object, there are several ways to find other STIX objects related to it. To illustrate this, let’s first create a DataStore and add some objects and relationships.
[10]:
from stix2 import Campaign, Identity, Indicator, Malware, Relationship
mem = MemoryStore()
cam = Campaign(name='Charge', description='Attack!')
idy = Identity(name='John Doe', identity_class="individual")
ind = Indicator(labels=['malicious-activity'], pattern="[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']")
mal = Malware(labels=['ransomware'], name="Cryptolocker", created_by_ref=idy)
rel1 = Relationship(ind, 'indicates', mal,)
rel2 = Relationship(mal, 'targets', idy)
rel3 = Relationship(cam, 'uses', mal)
mem.add([cam, idy, ind, mal, rel1, rel2, rel3])
If a STIX object has a created_by_ref
property, you can use the creator_of() method to retrieve the Identity object that created it.
[11]:
print(mem.creator_of(mal))
[11]:
{
"type": "identity",
"id": "identity--b67cf8d4-cc1a-4bb7-9402-fffcff17c9a9",
"created": "2018-04-05T20:43:54.117Z",
"modified": "2018-04-05T20:43:54.117Z",
"name": "John Doe",
"identity_class": "individual"
}
Use the relationships() method to retrieve all the relationship objects that reference a STIX object.
[12]:
rels = mem.relationships(mal)
len(rels)
[12]:
3
You can limit it to only specific relationship types:
[13]:
mem.relationships(mal, relationship_type='indicates')
[13]:
[Relationship(type='relationship', id='relationship--3b9cb248-5c2c-425d-85d0-680bfef6e69d', created='2018-04-05T20:43:54.134Z', modified='2018-04-05T20:43:54.134Z', relationship_type='indicates', source_ref='indicator--61deb2a5-305a-490e-83b3-9839a9677368', target_ref='malware--9fe343d8-edf7-4f4a-bb6c-a221fb75142d')]
You can limit it to only relationships where the given object is the source:
[14]:
mem.relationships(mal, source_only=True)
[14]:
[Relationship(type='relationship', id='relationship--8d322508-423b-4d51-be85-a95ad083f8af', created='2018-04-05T20:43:54.134Z', modified='2018-04-05T20:43:54.134Z', relationship_type='targets', source_ref='malware--9fe343d8-edf7-4f4a-bb6c-a221fb75142d', target_ref='identity--b67cf8d4-cc1a-4bb7-9402-fffcff17c9a9')]
And you can limit it to only relationships where the given object is the target:
[15]:
mem.relationships(mal, target_only=True)
[15]:
[Relationship(type='relationship', id='relationship--3b9cb248-5c2c-425d-85d0-680bfef6e69d', created='2018-04-05T20:43:54.134Z', modified='2018-04-05T20:43:54.134Z', relationship_type='indicates', source_ref='indicator--61deb2a5-305a-490e-83b3-9839a9677368', target_ref='malware--9fe343d8-edf7-4f4a-bb6c-a221fb75142d'),
Relationship(type='relationship', id='relationship--93e5afe0-d1fb-4315-8d08-10951f7a99b6', created='2018-04-05T20:43:54.134Z', modified='2018-04-05T20:43:54.134Z', relationship_type='uses', source_ref='campaign--edfd885c-bc31-4051-9bc2-08e057542d56', target_ref='malware--9fe343d8-edf7-4f4a-bb6c-a221fb75142d')]
Finally, you can retrieve all STIX objects related to a given STIX object using related_to(). This calls relationships() but then performs the extra step of getting the objects that these Relationships point to. related_to() takes all the same arguments that relationships() does.
[16]:
mem.related_to(mal, target_only=True, relationship_type='uses')
[16]:
[Campaign(type='campaign', id='campaign--edfd885c-bc31-4051-9bc2-08e057542d56', created='2018-04-05T20:43:54.117Z', modified='2018-04-05T20:43:54.117Z', name='Charge', description='Attack!')]
Using Environments¶
An Environment object makes it easier to use STIX 2 content as part of a larger application or ecosystem. It allows you to abstract away the nasty details of sending and receiving STIX data, and to create STIX objects with default values for common properties.
Storing and Retrieving STIX Content¶
An Environment can be set up with a DataStore if you want to store and retrieve STIX content from the same place.
[1]:
from stix2 import Environment, MemoryStore
env = Environment(store=MemoryStore())
If desired, you can instead set up an Environment with different data sources and sinks. In the following example we set up an environment that retrieves objects from memory and a directory on the filesystem, and stores objects in a different directory on the filesystem.
[6]:
from stix2 import CompositeDataSource, FileSystemSink, FileSystemSource, MemorySource
src = CompositeDataSource()
src.add_data_sources([MemorySource(), FileSystemSource("/tmp/stix2_source")])
env2 = Environment(source=src,
sink=FileSystemSink("/tmp/stix2_sink"))
Once you have an Environment you can store some STIX content in its DataSinks with add():
[7]:
from stix2 import Indicator
indicator = Indicator(id="indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7",
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
env.add(indicator)
You can retrieve STIX objects from the DataSources in the Environment with get(), query(), all_versions(), creator_of(), related_to(), and relationships() just as you would for a DataSource.
[8]:
print(env.get("indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7"))
[8]:
{
"type": "indicator",
"id": "indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7",
"created": "2018-04-05T19:27:53.923Z",
"modified": "2018-04-05T19:27:53.923Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:27:53.923548Z",
"labels": [
"malicious-activity"
]
}
Creating STIX Objects With Defaults¶
To create STIX objects with default values for certain properties, use an ObjectFactory. For instance, say we want all objects we create to have a created_by_ref
property pointing to the Identity
object representing our organization.
[13]:
from stix2 import Indicator, ObjectFactory
factory = ObjectFactory(created_by_ref="identity--311b2d2d-f010-4473-83ec-1edf84858f4c")
Once you’ve set up the ObjectFactory, use its create() method, passing in the class for the type of object you wish to create, followed by the other properties and their values for the object.
[14]:
ind = factory.create(Indicator,
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
print(ind)
[14]:
{
"type": "indicator",
"id": "indicator--c1b421c0-9c6b-4276-9b73-1b8684a5a0d2",
"created_by_ref": "identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
"created": "2018-04-05T19:28:48.776Z",
"modified": "2018-04-05T19:28:48.776Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:28:48.776442Z",
"labels": [
"malicious-activity"
]
}
All objects we create with that ObjectFactory will automatically get the default value for created_by_ref
. These are the properties for which defaults can be set:
created_by_ref
created
external_references
object_marking_refs
These defaults can be bypassed. For example, say you have an Environment with multiple default values but want to create an object with a different value for created_by_ref
, or none at all.
[15]:
factory2 = ObjectFactory(created_by_ref="identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
created="2017-09-25T18:07:46.255472Z")
env2 = Environment(factory=factory2)
ind2 = env2.create(Indicator,
created_by_ref=None,
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
print(ind2)
[15]:
{
"type": "indicator",
"id": "indicator--30a3b39c-5f57-4e7f-9eaf-e1abcb643da4",
"created": "2017-09-25T18:07:46.255Z",
"modified": "2017-09-25T18:07:46.255Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:28:53.268567Z",
"labels": [
"malicious-activity"
]
}
[16]:
ind3 = env2.create(Indicator,
created_by_ref="identity--962cabe5-f7f3-438a-9169-585a8c971d12",
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
print(ind3)
[16]:
{
"type": "indicator",
"id": "indicator--6c5bbaaf-6dac-44b0-a0df-86c27b3f6ecb",
"created_by_ref": "identity--962cabe5-f7f3-438a-9169-585a8c971d12",
"created": "2017-09-25T18:07:46.255Z",
"modified": "2017-09-25T18:07:46.255Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:29:56.55129Z",
"labels": [
"malicious-activity"
]
}
For the full power of the Environment layer, create an Environment with both a DataStore/Source/Sink and an ObjectFactory:
[17]:
environ = Environment(ObjectFactory(created_by_ref="identity--311b2d2d-f010-4473-83ec-1edf84858f4c"),
MemoryStore())
i = environ.create(Indicator,
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
environ.add(i)
print(environ.get(i.id))
[17]:
{
"type": "indicator",
"id": "indicator--d1b8c3f6-1de1-44c1-b079-3df307224a0d",
"created_by_ref": "identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
"created": "2018-04-05T19:29:59.605Z",
"modified": "2018-04-05T19:29:59.605Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:29:59.605463Z",
"labels": [
"malicious-activity"
]
}
Checking Semantic Equivalence¶
The Environment has a function for checking if two STIX Objects are semantically equivalent. For each supported object type, the algorithm checks if the values for a specific set of properties match. Then each matching property is weighted since every property doesn’t represent the same level of importance for semantic equivalence. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the the two objects are not equivalent, and a result of 100 means that they are equivalent.
TODO: Add a link to the committee note when it is released.
There are a number of use cases for which calculating semantic equivalence may be helpful. It can be used for echo detection, in which a STIX producer who consumes content from other producers wants to make sure they are not creating content they have already seen or consuming content they have already created.
Another use case for this functionality is to identify identical or near-identical content, such as a vulnerability shared under three different nicknames by three different STIX producers. A third use case involves a feed that aggregates data from multiple other sources. It will want to make sure that it is not publishing duplicate data.
Below we will show examples of the semantic equivalence results of various objects. Unless otherwise specified, the ID of each object will be generated by the library, so the two objects will not have the same ID. This demonstrates that the semantic equivalence algorithm only looks at specific properties for each object type.
Please note that you will need to install a few extra dependencies in order to use the semantic equivalence functions. You can do this using:
pip install stix2[semantic]
Attack Pattern Example¶
For Attack Patterns, the only properties that contribute to semantic equivalence are name
and external_references
, with weights of 30 and 70, respectively. In this example, both attack patterns have the same external reference but the second has a slightly different yet still similar name.
[3]:
import stix2
from stix2 import Environment, MemoryStore
from stix2.v21 import AttackPattern
env = Environment(store=MemoryStore())
ap1 = AttackPattern(
name="Phishing",
external_references=[
{
"url": "https://example2",
"source_name": "some-source2",
},
],
)
ap2 = AttackPattern(
name="Spear phishing",
external_references=[
{
"url": "https://example2",
"source_name": "some-source2",
},
],
)
print(env.semantically_equivalent(ap1, ap2))
[3]:
91.9
Campaign Example¶
For Campaigns, the only properties that contribute to semantic equivalence are name
and aliases
, with weights of 60 and 40, respectively. In this example, the two campaigns have completely different names, but slightly similar descriptions. The result may be higher than expected because the Jaro-Winkler algorithm used to compare string properties looks at the edit distance of the two strings rather than just the words in them.
[4]:
from stix2.v21 import Campaign
c1 = Campaign(
name="Someone Attacks Somebody",)
c2 = Campaign(
name="Another Campaign",)
print(env.semantically_equivalent(c1, c2))
[4]:
30.0
Identity Example¶
For Identities, the only properties that contribute to semantic equivalence are name
, identity_class
, and sectors
, with weights of 60, 20, and 20, respectively. In this example, the two identities are identical, but are missing one of the contributing properties. The algorithm only compares properties that are actually present on the objects. Also note that they have completely different description properties, but because description is not one of the properties considered for
semantic equivalence, this difference has no effect on the result.
[5]:
from stix2.v21 import Identity
id1 = Identity(
name="John Smith",
identity_class="individual",
description="Just some guy",
)
id2 = Identity(
name="John Smith",
identity_class="individual",
description="A person",
)
print(env.semantically_equivalent(id1, id2))
[5]:
100.0
Indicator Example¶
For Indicators, the only properties that contribute to semantic equivalence are indicator_types
, pattern
, and valid_from
, with weights of 15, 80, and 5, respectively. In this example, the two indicators have patterns with different hashes but the same indicator_type and valid_from. For patterns, the algorithm currently only checks if they are identical.
[6]:
from stix2.v21 import Indicator
ind1 = Indicator(
indicator_types=['malicious-activity'],
pattern_type="stix",
pattern="[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']",
valid_from="2017-01-01T12:34:56Z",
)
ind2 = Indicator(
indicator_types=['malicious-activity'],
pattern_type="stix",
pattern="[file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4']",
valid_from="2017-01-01T12:34:56Z",
)
print(env.semantically_equivalent(ind1, ind2))
Indicator pattern equivalence is not fully defined; will default to zero if not completely identical
[6]:
20.0
If the patterns were identical the result would have been 100.
Location Example¶
For Locations, the only properties that contribute to semantic equivalence are longitude
/latitude
, region
, and country
, with weights of 34, 33, and 33, respectively. In this example, the two locations are Washington, D.C. and New York City. The algorithm computes the distance between two locations using the haversine formula and uses that to influence equivalence.
[7]:
from stix2.v21 import Location
loc1 = Location(
latitude=38.889,
longitude=-77.023,
)
loc2 = Location(
latitude=40.713,
longitude=-74.006,
)
print(env.semantically_equivalent(loc1, loc2))
[7]:
67.20663955882583
Malware Example¶
For Malware, the only properties that contribute to semantic equivalence are malware_types
and name
, with weights of 20 and 80, respectively. In this example, the two malware objects only differ in the strings in their malware_types lists. For lists, the algorithm bases its calculations on the intersection of the two lists. An empty intersection will result in a 0, and a complete intersection will result in a 1 for that property.
[8]:
from stix2.v21 import Malware
MALWARE_ID = "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e"
mal1 = Malware(id=MALWARE_ID,
malware_types=['ransomware'],
name="Cryptolocker",
is_family=False,
)
mal2 = Malware(id=MALWARE_ID,
malware_types=['ransomware', 'dropper'],
name="Cryptolocker",
is_family=False,
)
print(env.semantically_equivalent(mal1, mal2))
[8]:
90.0
Threat Actor Example¶
For Threat Actors, the only properties that contribute to semantic equivalence are threat_actor_types
, name
, and aliases
, with weights of 20, 60, and 20, respectively. In this example, the two threat actors have the same id properties but everything else is different. Since the id property does not factor into semantic equivalence, the result is not very high. The result is not zero because of the “Token Sort Ratio” algorithm used to compare the name
property.
[9]:
from stix2.v21 import ThreatActor
THREAT_ACTOR_ID = "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f"
ta1 = ThreatActor(id=THREAT_ACTOR_ID,
threat_actor_types=["crime-syndicate"],
name="Evil Org",
aliases=["super-evil"],
)
ta2 = ThreatActor(id=THREAT_ACTOR_ID,
threat_actor_types=["spy"],
name="James Bond",
aliases=["007"],
)
print(env.semantically_equivalent(ta1, ta2))
[9]:
6.6000000000000005
Tool Example¶
For Tools, the only properties that contribute to semantic equivalence are tool_types
and name
, with weights of 20 and 80, respectively. In this example, the two tools have the same values for properties that contribute to semantic equivalence but one has an additional, non-contributing property.
[10]:
from stix2.v21 import Tool
t1 = Tool(
tool_types=["remote-access"],
name="VNC",
)
t2 = Tool(
tool_types=["remote-access"],
name="VNC",
description="This is a tool"
)
print(env.semantically_equivalent(t1, t2))
[10]:
100.0
Vulnerability Example¶
For Vulnerabilities, the only properties that contribute to semantic equivalence are name
and external_references
, with weights of 30 and 70, respectively. In this example, the two vulnerabilities have the same name but one also has an external reference. The algorithm doesn’t take into account any semantic equivalence contributing properties that are not present on both objects.
[11]:
from stix2.v21 import Vulnerability
vuln1 = Vulnerability(
name="Heartbleed",
external_references=[
{
"url": "https://example",
"source_name": "some-source",
},
],
)
vuln2 = Vulnerability(
name="Heartbleed",
)
print(env.semantically_equivalent(vuln1, vuln2))
[11]:
100.0
Other Examples¶
Comparing objects of different types will result in a ValueError
.
[12]:
print(env.semantically_equivalent(ind1, vuln1))
ValueError: The objects to compare must be of the same type!
Some object types do not have a defined method for calculating semantic equivalence and by default will give a warning and a result of zero.
[13]:
from stix2.v21 import Report
r1 = Report(
report_types=["campaign"],
name="Bad Cybercrime",
published="2016-04-06T20:03:00.000Z",
object_refs=["indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7"],
)
r2 = Report(
report_types=["campaign"],
name="Bad Cybercrime",
published="2016-04-06T20:03:00.000Z",
object_refs=["indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7"],
)
print(env.semantically_equivalent(r1, r2))
'report' type has no 'weights' dict specified & thus no semantic equivalence method to call!
[13]:
0
By default, comparing objects of different spec versions will result in a ValueError
.
[14]:
from stix2.v20 import Identity as Identity20
id20 = Identity20(
name="John Smith",
identity_class="individual",
)
print(env.semantically_equivalent(id2, id20))
ValueError: The objects to compare must be of the same spec version!
You can optionally allow comparing across spec versions by providing a configuration dictionary using ignore_spec_version
like in the next example:
[15]:
from stix2.v20 import Identity as Identity20
id20 = Identity20(
name="John Smith",
identity_class="individual",
)
print(env.semantically_equivalent(id2, id20, **{"_internal": {"ignore_spec_version": True}}))
[15]:
100.0
Detailed Results¶
If your logging level is set to DEBUG
or higher, the function will log more detailed results. These show the semantic equivalence and weighting for each property that is checked, to show how the final result was arrived at.
[16]:
import logging
logging.basicConfig(format='%(message)s')
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
ta3 = ThreatActor(
threat_actor_types=["crime-syndicate"],
name="Evil Org",
aliases=["super-evil"],
)
ta4 = ThreatActor(
threat_actor_types=["spy"],
name="James Bond",
aliases=["007"],
)
print(env.semantically_equivalent(ta3, ta4))
logger.setLevel(logging.ERROR)
Starting semantic equivalence process between: 'threat-actor--664624c7-394e-49ad-ae2a-12f7a48a54a3' and 'threat-actor--1d67719e-6be6-4194-9226-1685986514f5'
-- partial_string_based 'Evil Org' 'James Bond' result: '11'
'name' check -- weight: 60, contributing score: 6.6
-- partial_list_based '['crime-syndicate']' '['spy']' result: '0.0'
'threat_actor_types' check -- weight: 20, contributing score: 0.0
-- partial_list_based '['super-evil']' '['007']' result: '0.0'
'aliases' check -- weight: 20, contributing score: 0.0
Matching Score: 6.6, Sum of Weights: 100.0
[16]:
6.6000000000000005
You can also retrieve the detailed results in a dictionary so the detailed results information can be accessed and used more programatically. The semantically_equivalent() function takes an optional third argument, called prop_scores
. This argument should be a dictionary into which the detailed debugging information will be stored.
Using prop_scores
is simple: simply pass in a dictionary to semantically_equivalent()
, and after the function is done executing, the dictionary will have the various scores in it. Specifically, it will have the overall matching_score
and sum_weights
, along with the weight and contributing score for each of the semantic equivalence contributing properties.
For example:
[18]:
ta5 = ThreatActor(
threat_actor_types=["crime-syndicate", "spy"],
name="Evil Org",
aliases=["super-evil"],
)
ta6 = ThreatActor(
threat_actor_types=["spy"],
name="James Bond",
aliases=["007"],
)
prop_scores = {}
print("Semantic equivalence score using standard weights: %s" % (env.semantically_equivalent(ta5, ta6, prop_scores)))
print(prop_scores)
for prop in prop_scores:
if prop not in ["matching_score", "sum_weights"]:
print ("Prop: %s | weight: %s | contributing_score: %s" % (prop, prop_scores[prop]['weight'], prop_scores[prop]['contributing_score']))
else:
print ("%s: %s" % (prop, prop_scores[prop]))
[18]:
Semantic equivalence score using standard weights: 16.6
[18]:
{'name': {'weight': 60, 'contributing_score': 6.6}, 'threat_actor_types': {'weight': 20, 'contributing_score': 10.0}, 'aliases': {'weight': 20, 'contributing_score': 0.0}, 'matching_score': 16.6, 'sum_weights': 100.0}
[18]:
Prop: name | weight: 60 | contributing_score: 6.6
[18]:
Prop: threat_actor_types | weight: 20 | contributing_score: 10.0
[18]:
Prop: aliases | weight: 20 | contributing_score: 0.0
[18]:
matching_score: 16.6
[18]:
sum_weights: 100.0
Custom Comparisons¶
If you wish, you can customize semantic equivalence comparisons. Specifically, you can do any of three things: - Provide custom weights for each semantic equivalence contributing property - Provide custom comparison functions for individual semantic equivalence contributing properties - Provide a custom semantic equivalence function for a specific object type
The weights
dictionary¶
In order to do any of the aforementioned (optional) custom comparisons, you will need to provide a weights
dictionary as the last parameter to the semantically_equivalent() method call.
The weights dictionary should contain both the weight and the comparison function for each property. You may use the default weights and functions, or provide your own.
Existing comparison functions¶
For reference, here is a list of the comparison functions already built in the codebase (found in stix2/environment.py):
- custom_pattern_based
- exact_match
- partial_external_reference_based
- partial_list_based
- partial_location_distance
- partial_string_based
- partial_timestamp_based
For instance, if we wanted to compare two of the ThreatActor
s from before, but use our own weights, then we could do the following:
[19]:
weights = {
"threat-actor": { # You must specify the object type
"name": (30, stix2.environment.partial_string_based), # Each property's value must be a tuple
"threat_actor_types": (50, stix2.environment.partial_list_based), # The 1st component must be the weight
"aliases": (20, stix2.environment.partial_list_based) # The 2nd component must be the comparison function
}
}
print("Using standard weights: %s" % (env.semantically_equivalent(ta5, ta6)))
print("Using custom weights: %s" % (env.semantically_equivalent(ta5, ta6, **weights)))
[19]:
Using standard weights: 16.6
[19]:
Using custom weights: 28.300000000000004
Notice how there is a difference in the semantic equivalence scores, simply due to the fact that custom weights were used.
Custom Weights With prop_scores¶
If we want to use both prop_scores
and weights
, then they would be the third and fourth arguments, respectively, to sematically_equivalent()
:
[20]:
prop_scores = {}
weights = {
"threat-actor": {
"name": (45, stix2.environment.partial_string_based),
"threat_actor_types": (10, stix2.environment.partial_list_based),
"aliases": (45, stix2.environment.partial_list_based),
},
}
env.semantically_equivalent(ta5, ta6, prop_scores, **weights)
print(prop_scores)
[20]:
9.95
[20]:
{'name': {'weight': 45, 'contributing_score': 4.95}, 'threat_actor_types': {'weight': 10, 'contributing_score': 5.0}, 'aliases': {'weight': 45, 'contributing_score': 0.0}, 'matching_score': 9.95, 'sum_weights': 100.0}
Custom Semantic Equivalence Functions¶
You can also write and use your own semantic equivalence functions. In the examples above, you could replace the built-in comparison functions for any or all properties. For example, here we use a custom string comparison function just for the 'name'
property:
[21]:
def my_string_compare(p1, p2):
if p1 == p2:
return 1
else:
return 0
weights = {
"threat-actor": {
"name": (45, my_string_compare),
"threat_actor_types": (10, stix2.environment.partial_list_based),
"aliases": (45, stix2.environment.partial_list_based),
},
}
print("Using custom string comparison: %s" % (env.semantically_equivalent(ta5, ta6, **weights)))
[21]:
Using custom string comparison: 5.0
You can also customize the comparison of an entire object type instead of just how each property is compared. To do this, provide a weights
dictionary to semantically_equivalent()
and in this dictionary include a key of "method"
whose value is your custom semantic equivalence function for that object type.
If you provide your own custom semantic equivalence method, you must also provide the weights for each of the properties (unless, for some reason, your custom method is weights-agnostic). However, since you are writing the custom method, your weights need not necessarily follow the tuple format specified in the above code box.
Note also that if you want detailed results with prop_scores
you will need to implement that in your custom function, but you are not required to do so.
In this next example we use our own custom semantic equivalence function to compare two ThreatActor
s, and do not support prop_scores
.
[22]:
def custom_semantic_equivalence_method(obj1, obj2, **weights):
sum_weights = 0
matching_score = 0
# Compare name
w = weights['name']
sum_weights += w
contributing_score = w * stix2.environment.partial_string_based(obj1['name'], obj2['name'])
matching_score += contributing_score
# Compare aliases only for spies
if 'spy' in obj1['threat_actor_types'] + obj2['threat_actor_types']:
w = weights['aliases']
sum_weights += w
contributing_score = w * stix2.environment.partial_list_based(obj1['aliases'], obj2['aliases'])
matching_score += contributing_score
return matching_score, sum_weights
weights = {
"threat-actor": {
"name": 60,
"aliases": 40,
"method": custom_semantic_equivalence_method
}
}
print("Using standard weights: %s" % (env.semantically_equivalent(ta5, ta6)))
print("Using a custom method: %s" % (env.semantically_equivalent(ta5, ta6, **weights)))
[22]:
Using standard weights: 16.6
[22]:
Using a custom method: 6.6000000000000005
You can also write custom functions for comparing objects of your own custom types. Like in the previous example, you can use the built-in functions listed above to help with this, or write your own. In the following example we define semantic equivalence for our new x-foobar
object type. Notice that this time we have included support for detailed results with prop_scores
.
[23]:
def _x_foobar_checks(obj1, obj2, prop_scores, **weights):
matching_score = 0.0
sum_weights = 0.0
if stix2.environment.check_property_present("name", obj1, obj2):
w = weights["name"]
sum_weights += w
contributing_score = w * stix2.environment.partial_string_based(obj1["name"], obj2["name"])
matching_score += contributing_score
prop_scores["name"] = (w, contributing_score)
if stix2.environment.check_property_present("color", obj1, obj2):
w = weights["color"]
sum_weights += w
contributing_score = w * stix2.environment.partial_string_based(obj1["color"], obj2["color"])
matching_score += contributing_score
prop_scores["color"] = (w, contributing_score)
prop_scores["matching_score"] = matching_score
prop_scores["sum_weights"] = sum_weights
return matching_score, sum_weights
prop_scores = {}
weights = {
"x-foobar": {
"name": 60,
"color": 40,
"method": _x_foobar_checks,
},
"_internal": {
"ignore_spec_version": False,
},
}
foo1 = {
"type":"x-foobar",
"id":"x-foobar--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061",
"name": "Zot",
"color": "red",
}
foo2 = {
"type":"x-foobar",
"id":"x-foobar--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061",
"name": "Zot",
"color": "blue",
}
print(env.semantically_equivalent(foo1, foo2, prop_scores, **weights))
print(prop_scores)
[23]:
71.6
[23]:
{'name': (60, 60.0), 'color': (40, 11.6), 'matching_score': 71.6, 'sum_weights': 100.0}
FileSystem¶
The FileSystem suite contains FileSystemStore, FileSystemSource and FileSystemSink. Under the hood, all FileSystem objects point to a file directory (on disk) that contains STIX 2 content.
The directory and file structure of the intended STIX 2 content should be:
stix2_content/
/STIX2 Domain Object type
STIX2 Domain Object
STIX2 Domain Object
.
.
.
/STIX2 Domain Object type
STIX2 Domain Object
STIX2 Domain Object
.
.
.
.
.
.
/STIX2 Domain Object type
The master STIX 2 content directory contains subdirectories, each of which aligns to a STIX 2 domain object type (i.e. “attack-pattern”, “campaign”, “malware”, etc.). Within each STIX 2 domain object subdirectory are JSON files that are STIX 2 domain objects of the specified type. The name of the json files correspond to the ID of the STIX 2 domain object found within that file. A real example of the FileSystem directory structure:
stix2_content/
/attack-pattern
attack-pattern--00d0b012-8a03-410e-95de-5826bf542de6.json
attack-pattern--0a3ead4e-6d47-4ccb-854c-a6a4f9d96b22.json
attack-pattern--1b7ba276-eedc-4951-a762-0ceea2c030ec.json
/campaign
/course-of-action
course-of-action--2a8de25c-f743-4348-b101-3ee33ab5871b.json
course-of-action--2c3ce852-06a2-40ee-8fe6-086f6402a739.json
/identity
identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5.json
/indicator
/intrusion-set
/malware
malware--1d808f62-cf63-4063-9727-ff6132514c22.json
malware--2eb9b131-d333-4a48-9eb4-d8dec46c19ee.json
/observed-data
/report
/threat-actor
/vulnerability
FileSystemStore is intended for use cases where STIX 2 content is retrieved and pushed to the same file directory. As FileSystemStore is just a wrapper around a paired FileSystemSource and FileSystemSink that point the same file directory.
For use cases where STIX 2 content will only be retrieved or pushed, then a FileSystemSource and FileSystemSink can be used individually. They can also be used individually when STIX 2 content will be retrieved from one distinct file directory and pushed to another.
FileSystem API¶
A note on get(), all_versions(), and query(): The format of the STIX2 content targeted by the FileSystem suite is JSON files. When the FileSystemStore retrieves STIX 2 content (in JSON) from disk, it will attempt to parse the content into full-featured python-stix2 objects and returned as such.
A note on add(): When STIX content is added (pushed) to the file system, the STIX content can be supplied in the following forms: Python STIX objects, Python dictionaries (of valid STIX objects or Bundles), JSON-encoded strings (of valid STIX objects or Bundles), or a (Python) list of any of the previously listed types. Any of the previous STIX content forms will be converted to a STIX JSON object (in a STIX Bundle) and written to disk.
FileSystem Examples¶
FileSystemStore¶
Use the FileSystemStore when you want to both retrieve STIX content from the file system and push STIX content to it, too.
[4]:
from stix2 import FileSystemStore
# create FileSystemStore
fs = FileSystemStore("/tmp/stix2_store")
# retrieve STIX2 content from FileSystemStore
ap = fs.get("attack-pattern--00d0b012-8a03-410e-95de-5826bf542de6")
mal = fs.get("malware--00c3bfcb-99bd-4767-8c03-b08f585f5c8a")
# for visual purposes
print(mal)
[4]:
{
"type": "malware",
"id": "malware--00c3bfcb-99bd-4767-8c03-b08f585f5c8a",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
"created": "2017-05-31T21:33:19.746Z",
"modified": "2017-05-31T21:33:19.746Z",
"name": "PowerDuke",
"description": "PowerDuke is a backdoor that was used by APT29 in 2016. It has primarily been delivered through Microsoft Word or Excel attachments containing malicious macros.[[Citation: Volexity PowerDuke November 2016]]",
"labels": [
"malware"
],
"external_references": [
{
"source_name": "mitre-attack",
"url": "https://attack.mitre.org/wiki/Software/S0139",
"external_id": "S0139"
},
{
"source_name": "Volexity PowerDuke November 2016",
"description": "Adair, S.. (2016, November 9). PowerDuke: Widespread Post-Election Spear Phishing Campaigns Targeting Think Tanks and NGOs. Retrieved January 11, 2017.",
"url": "https://www.volexity.com/blog/2016/11/09/powerduke-post-election-spear-phishing-campaigns-targeting-think-tanks-and-ngos/"
}
],
"object_marking_refs": [
"marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
]
}
[2]:
from stix2 import ThreatActor, Indicator
# create new STIX threat-actor
ta = ThreatActor(name="Adjective Bear",
labels=["nation-state"],
sophistication="innovator",
resource_level="government",
goals=[
"compromising media outlets",
"water-hole attacks geared towards political, military targets",
"intelligence collection"
])
# create new indicators
ind = Indicator(description="Crusades C2 implant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '54b7e05e39a59428743635242e4a867c932140a999f52a1e54fa7ee6a440c73b']")
ind1 = Indicator(description="Crusades C2 implant 2",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '64c7e05e40a59511743635242e4a867c932140a999f52a1e54fa7ee6a440c73b']")
# add STIX object (threat-actor) to FileSystemStore
fs.add(ta)
# can also add multiple STIX objects to FileSystemStore in one call
fs.add([ind, ind1])
FileSystemSource¶
Use the FileSystemSource when you only want to retrieve STIX content from the file system.
[6]:
from stix2 import FileSystemSource
# create FileSystemSource
fs_source = FileSystemSource("/tmp/stix2_source")
# retrieve STIX 2 objects
ap = fs_source.get("attack-pattern--00d0b012-8a03-410e-95de-5826bf542de6")
# for visual purposes
print(ap)
[6]:
{
"type": "attack-pattern",
"id": "attack-pattern--00d0b012-8a03-410e-95de-5826bf542de6",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
"created": "2017-05-31T21:30:54.176Z",
"modified": "2017-05-31T21:30:54.176Z",
"name": "Indicator Removal from Tools",
"description": "If a malicious...command-line parameters, Process monitoring",
"kill_chain_phases": [
{
"kill_chain_name": "mitre-attack",
"phase_name": "defense-evasion"
}
],
"external_references": [
{
"source_name": "mitre-attack",
"url": "https://attack.mitre.org/wiki/Technique/T1066",
"external_id": "T1066"
}
],
"object_marking_refs": [
"marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
]
}
[7]:
from stix2 import Filter
# create filter for type=malware
query = [Filter("type", "=", "malware")]
# query on the filter
mals = fs_source.query(query)
for mal in mals:
print(mal.id)
[7]:
malware--96b08451-b27a-4ff6-893f-790e26393a8e
[7]:
malware--b42378e0-f147-496f-992a-26a49705395b
[7]:
malware--6b616fc1-1505-48e3-8b2c-0d19337bff38
[7]:
malware--92ec0cbd-2c30-44a2-b270-73f4ec949841
[8]:
# add more filters to the query
query.append(Filter("modified", ">" , "2017-05-31T21:33:10.772474Z"))
mals = fs_source.query(query)
# for visual purposes
for mal in mals:
print(mal.id)
[8]:
malware--92ec0cbd-2c30-44a2-b270-73f4ec949841
FileSystemSink¶
Use the FileSystemSink when you only want to push STIX content to the file system.
[10]:
from stix2 import FileSystemSink, Campaign, Indicator
# create FileSystemSink
fs_sink = FileSystemSink("/tmp/stix2_sink")
# create STIX objects and add to sink
camp = Campaign(name="The Crusades",
objective="Infiltrating Israeli, Iranian and Palestinian digital infrastructure and government systems.",
aliases=["Desert Moon"])
ind = Indicator(description="Crusades C2 implant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '54b7e05e39a59428743635242e4a867c932140a999f52a1e54fa7ee6a440c73b']")
ind1 = Indicator(description="Crusades C2 implant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '54b7e05e39a59428743635242e4a867c932140a999f52a1e54fa7ee6a440c73b']")
# add Campaign object to FileSystemSink
fs_sink.add(camp)
# can also add STIX objects to FileSystemSink in on call
fs_sink.add([ind, ind1])
Data Markings¶
Creating Objects With Data Markings¶
To create an object with a (predefined) TLP marking to an object, just provide it as a keyword argument to the constructor. The TLP markings can easily be imported from python-stix2.
[7]:
from stix2 import Indicator, TLP_AMBER
indicator = Indicator(labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
object_marking_refs=TLP_AMBER)
print(indicator)
[7]:
{
"type": "indicator",
"id": "indicator--95a71cff-fad0-4ffb-a641-8a6eaa642290",
"created": "2018-04-05T19:49:47.924Z",
"modified": "2018-04-05T19:49:47.924Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:47.924708Z",
"labels": [
"malicious-activity"
],
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
]
}
If you’re creating your own marking (for example, a Statement
marking), first create the statement marking:
[8]:
from stix2 import MarkingDefinition, StatementMarking
marking_definition = MarkingDefinition(
definition_type="statement",
definition=StatementMarking(statement="Copyright 2017, Example Corp")
)
print(marking_definition)
[8]:
{
"type": "marking-definition",
"id": "marking-definition--13680b12-3d19-4b42-abe6-0d31effe5368",
"created": "2018-04-05T19:49:53.98008Z",
"definition_type": "statement",
"definition": {
"statement": "Copyright 2017, Example Corp"
}
}
Then you can add it to an object as it’s being created (passing either full object or the the ID as a keyword argument, like with relationships).
[9]:
indicator2 = Indicator(labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
object_marking_refs=marking_definition)
print(indicator2)
[9]:
{
"type": "indicator",
"id": "indicator--7caeab49-2472-41bb-a988-2f990aea99bd",
"created": "2018-04-05T19:49:55.763Z",
"modified": "2018-04-05T19:49:55.763Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:55.763364Z",
"labels": [
"malicious-activity"
],
"object_marking_refs": [
"marking-definition--13680b12-3d19-4b42-abe6-0d31effe5368"
]
}
[10]:
indicator3 = Indicator(labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
object_marking_refs="marking-definition--f88d31f6-486f-44da-b317-01333bde0b82")
print(indicator3)
[10]:
{
"type": "indicator",
"id": "indicator--4eb21bbe-b8a9-4348-86cf-1ed52f9abdd7",
"created": "2018-04-05T19:49:57.248Z",
"modified": "2018-04-05T19:49:57.248Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:57.248658Z",
"labels": [
"malicious-activity"
],
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
]
}
Granular markings work in the same way, except you also need to provide a full granular-marking object (including the selector).
[11]:
from stix2 import Malware, TLP_WHITE
malware = Malware(name="Poison Ivy",
labels=['remote-access-trojan'],
description="A ransomware related to ...",
granular_markings=[
{
"selectors": ["description"],
"marking_ref": marking_definition
},
{
"selectors": ["name"],
"marking_ref": TLP_WHITE
}
])
print(malware)
[11]:
{
"type": "malware",
"id": "malware--ef1eddbb-b5a5-47e0-b607-75b9870d8d91",
"created": "2018-04-05T19:49:59.103Z",
"modified": "2018-04-05T19:49:59.103Z",
"name": "Poison Ivy",
"description": "A ransomware related to ...",
"labels": [
"remote-access-trojan"
],
"granular_markings": [
{
"marking_ref": "marking-definition--13680b12-3d19-4b42-abe6-0d31effe5368",
"selectors": [
"description"
]
},
{
"marking_ref": "marking-definition--613f2e26-407d-48c7-9eca-b8e91df99dc9",
"selectors": [
"name"
]
}
]
}
Make sure that the selector is a field that exists and is populated on the object, otherwise this will cause an error:
[12]:
Malware(name="Poison Ivy",
labels=['remote-access-trojan'],
description="A ransomware related to ...",
granular_markings=[
{
"selectors": ["title"],
"marking_ref": marking_definition
}
])
InvalidSelectorError: Selector title in Malware is not valid!
Adding Data Markings To Existing Objects¶
Several functions exist to support working with data markings.
Both object markings and granular markings can be added to STIX objects which have already been created.
Note: Doing so will create a new version of the object (note the updated modified
time).
[13]:
indicator4 = indicator.add_markings(marking_definition)
print(indicator4)
[13]:
{
"type": "indicator",
"id": "indicator--95a71cff-fad0-4ffb-a641-8a6eaa642290",
"created": "2018-04-05T19:49:47.924Z",
"modified": "2018-04-05T19:50:03.387Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:47.924708Z",
"labels": [
"malicious-activity"
],
"object_marking_refs": [
"marking-definition--13680b12-3d19-4b42-abe6-0d31effe5368",
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
]
}
You can also remove specific markings from STIX objects. This will also create a new version of the object.
[14]:
indicator5 = indicator4.remove_markings(marking_definition)
print(indicator5)
[14]:
{
"type": "indicator",
"id": "indicator--95a71cff-fad0-4ffb-a641-8a6eaa642290",
"created": "2018-04-05T19:49:47.924Z",
"modified": "2018-04-05T19:50:05.109Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:47.924708Z",
"labels": [
"malicious-activity"
],
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
]
}
The markings on an object can be replaced with a different set of markings:
[15]:
from stix2 import TLP_GREEN
indicator6 = indicator5.set_markings([TLP_GREEN, marking_definition])
print(indicator6)
[15]:
{
"type": "indicator",
"id": "indicator--95a71cff-fad0-4ffb-a641-8a6eaa642290",
"created": "2018-04-05T19:49:47.924Z",
"modified": "2018-04-05T19:50:06.773Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:47.924708Z",
"labels": [
"malicious-activity"
],
"object_marking_refs": [
"marking-definition--13680b12-3d19-4b42-abe6-0d31effe5368",
"marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"
]
}
STIX objects can also be cleared of all markings with clear_markings():
[16]:
indicator7 = indicator5.clear_markings()
print(indicator7)
[16]:
{
"type": "indicator",
"id": "indicator--95a71cff-fad0-4ffb-a641-8a6eaa642290",
"created": "2018-04-05T19:49:47.924Z",
"modified": "2018-04-05T19:50:08.616Z",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T19:49:47.924708Z",
"labels": [
"malicious-activity"
]
}
All of these functions can be used for granular markings by passing in a list of selectors. Note that they will create new versions of the objects.
Evaluating Data Markings¶
You can get a list of the object markings on a STIX object:
[17]:
indicator6.get_markings()
[17]:
['marking-definition--13680b12-3d19-4b42-abe6-0d31effe5368',
'marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da']
To get a list of the granular markings on an object, pass the object and a list of selectors to get_markings():
[18]:
from stix2 import get_markings
get_markings(malware, 'name')
[18]:
['marking-definition--613f2e26-407d-48c7-9eca-b8e91df99dc9']
You can also call get_markings() as a method on the STIX object.
[19]:
malware.get_markings('name')
[19]:
['marking-definition--613f2e26-407d-48c7-9eca-b8e91df99dc9']
Finally, you may also check if an object is marked by a specific markings. Again, for granular markings, pass in the selector or list of selectors.
[20]:
indicator.is_marked(TLP_AMBER.id)
[20]:
True
[21]:
malware.is_marked(TLP_WHITE.id, 'name')
[21]:
True
[22]:
malware.is_marked(TLP_WHITE.id, 'description')
[22]:
False
Extracting Lang Data Markings or marking-definition Data Markings¶
If you need a specific kind of marking, you can also filter them using the API. By default the library will get both types of markings by default. You can choose between lang=True/False
or marking_ref=True/False
depending on your use-case.
[16]:
from stix2 import v21
v21_indicator = v21.Indicator(
description="Una descripcion sobre este indicador",
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
object_marking_refs=['marking-definition--f88d31f6-486f-44da-b317-01333bde0b82'],
indicator_types=['malware'],
granular_markings=[
{
'selectors': ['description'],
'lang': 'es'
},
{
'selectors': ['description'],
'marking_ref': 'marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da'
}
]
)
print(v21_indicator)
# Gets both lang and marking_ref markings for 'description'
print(v21_indicator.get_markings('description'))
# Exclude lang markings from results
print(v21_indicator.get_markings('description', lang=False))
# Exclude marking-definition markings from results
print(v21_indicator.get_markings('description', marking_ref=False))
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--634ef462-d6b5-48bc-9d9f-b46a6919227c",
"created": "2019-05-03T18:36:44.354Z",
"modified": "2019-05-03T18:36:44.354Z",
"description": "Una descripcion sobre este indicador",
"indicator_types": [
"malware"
],
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-05-03T18:36:44.354443Z",
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
],
"granular_markings": [
{
"lang": "es",
"selectors": [
"description"
]
},
{
"marking_ref": "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da",
"selectors": [
"description"
]
}
]
}
['es', 'marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da']
['marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da']
['es']
In this same manner, calls to clear_markings
and set_markings
also have the ability to operate in for one or both types of markings.
[5]:
print(v21_indicator.clear_markings("description")) # By default, both types of markings will be removed
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--a612665a-2df4-4fd2-851c-7fbb8c92339a",
"created": "2019-05-03T19:13:59.010Z",
"modified": "2019-05-03T19:15:41.173Z",
"description": "Una descripcion sobre este indicador",
"indicator_types": [
"malware"
],
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-05-03T19:13:59.010624Z",
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
]
}
[13]:
# If lang is False, no lang markings will be removed
print(v21_indicator.clear_markings("description", lang=False))
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--982aeb4d-4dd3-4b04-aa50-a1d00c31986c",
"created": "2019-05-03T19:19:26.542Z",
"modified": "2019-05-03T19:20:51.818Z",
"description": "Una descripcion sobre este indicador",
"indicator_types": [
"malware"
],
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-05-03T19:19:26.542267Z",
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
],
"granular_markings": [
{
"lang": "es",
"selectors": [
"description"
]
}
]
}
[2]:
# If marking_ref is False, no marking-definition markings will be removed
print(v21_indicator.clear_markings("description", marking_ref=False))
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--de0316d6-38e1-43c2-af4f-649305251864",
"created": "2019-05-03T19:40:21.459Z",
"modified": "2019-05-03T19:40:26.431Z",
"description": "Una descripcion sobre este indicador",
"indicator_types": [
"malware"
],
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-05-03T19:40:21.459582Z",
"object_marking_refs": [
"marking-definition--f88d31f6-486f-44da-b317-01333bde0b82"
],
"granular_markings": [
{
"marking_ref": "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da",
"selectors": [
"description"
]
}
]
}
Memory¶
The Memory suite consists of MemoryStore, MemorySource, and MemorySink. Under the hood, the Memory suite points to an in-memory dictionary. Similarly, the MemoryStore is a just a wrapper around a paired MemorySource and MemorySink; as there is quite limited uses for just a MemorySource or a MemorySink, it is recommended to always use MemoryStore. The MemoryStore is intended for retrieving/searching and pushing STIX content to memory. It is important to note that all STIX content in memory is not backed up on the file system (disk), as that functionality is encompassed within the FileSystemStore. However, the Memory suite does provide some utility methods for saving and loading STIX content to disk. MemoryStore.save_to_file() allows for saving all the STIX content that is in memory to a json file. MemoryStore.load_from_file() allows for loading STIX content from a JSON-formatted file.
Memory API¶
A note on adding and retreiving STIX content to the Memory suite: As mentioned, under the hood the Memory suite is an internal, in-memory dictionary. STIX content that is to be added can be in the following forms: python-stix2 objects, (Python) dictionaries (of valid STIX objects or Bundles), JSON-encoded strings (of valid STIX objects or Bundles), or a (Python) list of any of the previously listed types. MemoryStore actually stores and retrieves STIX content as python-stix2 objects.
A note on load_from_file(): For load_from_file(), STIX content is assumed to be in JSON form within the file, as an individual STIX object or in a Bundle. When the JSON is loaded, the STIX objects are parsed into python-stix2 objects before being stored in the in-memory dictionary.
A note on save_to_file(): This method dumps all STIX content that is in the MemoryStore to the specified file. The file format will be JSON, and the STIX content will be within a STIX Bundle.
Memory Examples¶
MemoryStore¶
[3]:
from stix2 import MemoryStore, Indicator
# create default MemoryStore
mem = MemoryStore()
# insert newly created indicator into memory
ind = Indicator(description="Crusades C2 implant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '54b7e05e39a59428743635242e4a867c932140a999f52a1e54fa7ee6a440c73b']")
mem.add(ind)
# for visual purposes
print(mem.get(ind.id))
[3]:
{
"type": "indicator",
"id": "indicator--41a960c7-a6d4-406d-9156-0069cb3bd40d",
"created": "2018-04-05T19:50:41.222Z",
"modified": "2018-04-05T19:50:41.222Z",
"description": "Crusades C2 implant",
"pattern": "[file:hashes.'SHA-256' = '54b7e05e39a59428743635242e4a867c932140a999f52a1e54fa7ee6a440c73b']",
"valid_from": "2018-04-05T19:50:41.222522Z",
"labels": [
"malicious-activity"
]
}
[4]:
from stix2 import Malware
# add multiple STIX objects into memory
ind2 = Indicator(description="Crusades stage 2 implant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '70fa62fb218dd9d936ee570dbe531dfa4e7c128ff37e6af7a6a6b2485487e50a']")
ind3 = Indicator(description="Crusades stage 2 implant variant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '31a45e777e4d58b97f4c43e38006f8cd6580ddabc4037905b2fad734712b582c']")
mal = Malware(labels=["rootkit"], name= "Alexios")
mem.add([ind2,ind3, mal])
# for visual purposes
print(mem.get(ind3.id))
[4]:
{
"type": "indicator",
"id": "indicator--ba2a7acb-a3ac-420b-9288-09988aa99408",
"created": "2018-04-05T19:50:43.343Z",
"modified": "2018-04-05T19:50:43.343Z",
"description": "Crusades stage 2 implant variant",
"pattern": "[file:hashes.'SHA-256' = '31a45e777e4d58b97f4c43e38006f8cd6580ddabc4037905b2fad734712b582c']",
"valid_from": "2018-04-05T19:50:43.343298Z",
"labels": [
"malicious-activity"
]
}
[5]:
from stix2 import Filter
mal = mem.query([Filter("labels","=", "rootkit")])[0]
print(mal)
[5]:
{
"type": "malware",
"id": "malware--9e9b87ce-2b2b-455a-8d5b-26384ccc8d52",
"created": "2018-04-05T19:50:43.346Z",
"modified": "2018-04-05T19:50:43.346Z",
"name": "Alexios",
"labels": [
"rootkit"
]
}
load_from_file() and save_to_file()¶
[8]:
mem_2 = MemoryStore()
# save (dump) all STIX content in MemoryStore to json file
mem.save_to_file("path_to_target_file.json")
# load(add) STIX content from json file into MemoryStore
mem_2.load_from_file("path_to_target_file.json")
report = mem_2.get("malware--9e9b87ce-2b2b-455a-8d5b-26384ccc8d52")
# for visual purposes
print(report)
[8]:
{
"type": "malware",
"id": "malware--9e9b87ce-2b2b-455a-8d5b-26384ccc8d52",
"created": "2018-04-05T19:50:43.346Z",
"modified": "2018-04-05T19:50:43.346Z",
"name": "Alexios",
"labels": [
"rootkit"
]
}
Parsing STIX Content¶
Parsing STIX content is as easy as calling the parse() function on a JSON string, dictionary, or file-like object. It will automatically determine the type of the object. The STIX objects within bundle
objects, and the cyber observables contained within observed-data
objects will be parsed as well.
Parsing a string
[3]:
from stix2 import parse
input_string = """{
"type": "observed-data",
"id": "observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf",
"created": "2016-04-06T19:58:16.000Z",
"modified": "2016-04-06T19:58:16.000Z",
"first_observed": "2015-12-21T19:00:00Z",
"last_observed": "2015-12-21T19:00:00Z",
"number_observed": 50,
"objects": {
"0": {
"type": "file",
"hashes": {
"SHA-256": "0969de02ecf8a5f003e3f6d063d848c8a193aada092623f8ce408c15bcb5f038"
}
}
}
}"""
obj = parse(input_string)
print(type(obj))
print(obj)
[3]:
<class 'stix2.v20.sdo.ObservedData'>
[3]:
{
"type": "observed-data",
"id": "observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf",
"created": "2016-04-06T19:58:16.000Z",
"modified": "2016-04-06T19:58:16.000Z",
"first_observed": "2015-12-21T19:00:00Z",
"last_observed": "2015-12-21T19:00:00Z",
"number_observed": 50,
"objects": {
"0": {
"type": "file",
"hashes": {
"SHA-256": "0969de02ecf8a5f003e3f6d063d848c8a193aada092623f8ce408c15bcb5f038"
}
}
}
}
Parsing a dictionary
[4]:
input_dict = {
"type": "identity",
"id": "identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
"created": "2015-12-21T19:59:11Z",
"modified": "2015-12-21T19:59:11Z",
"name": "Cole Powers",
"identity_class": "individual"
}
obj = parse(input_dict)
print(type(obj))
print(obj)
[4]:
<class 'stix2.v20.sdo.Identity'>
[4]:
{
"type": "identity",
"id": "identity--311b2d2d-f010-4473-83ec-1edf84858f4c",
"created": "2015-12-21T19:59:11.000Z",
"modified": "2015-12-21T19:59:11.000Z",
"name": "Cole Powers",
"identity_class": "individual"
}
Parsing a file-like object
[5]:
file_handle = open("/tmp/stix2_store/course-of-action/course-of-action--d9727aee-48b8-4fdb-89e2-4c49746ba4dd.json")
obj = parse(file_handle)
print(type(obj))
print(obj)
[5]:
<class 'stix2.v20.sdo.CourseOfAction'>
[5]:
{
"type": "course-of-action",
"id": "course-of-action--d9727aee-48b8-4fdb-89e2-4c49746ba4dd",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
"created": "2017-05-31T21:30:41.022Z",
"modified": "2017-05-31T21:30:41.022Z",
"name": "Data from Network Shared Drive Mitigation",
"description": "Identify unnecessary system utilities or potentially malicious software that may be used to collect data from a network share, and audit and/or block them by using whitelisting[[CiteRef::Beechey 2010]] tools, like AppLocker,[[CiteRef::Windows Commands JPCERT]][[CiteRef::NSA MS AppLocker]] or Software Restriction Policies[[CiteRef::Corio 2008]] where appropriate.[[CiteRef::TechNet Applocker vs SRP]]"
}
Parsing Custom STIX Content¶
Parsing custom STIX objects and/or STIX objects with custom properties is also completed easily with parse(). Just supply the keyword argument allow_custom=True
. When allow_custom
is specified, parse() will attempt to convert the supplied STIX content to known STIX 2 domain objects and/or previously defined custom STIX 2 objects. If the conversion cannot be completed (and
allow_custom
is specified), parse() will treat the supplied STIX 2 content as valid STIX 2 objects and return them. Warning: Specifying allow_custom may lead to critical errors if further processing (searching, filtering, modifying etc…) of the custom content occurs where the custom content supplied is not valid STIX 2. This is an axiomatic possibility as the stix2
library cannot guarantee proper processing of unknown custom STIX 2
objects that were explicitly flagged to be allowed, and thus may not be valid.
For examples of parsing STIX 2 objects with custom STIX properties, see Custom STIX Content: Custom Properties
For examples of parsing defined custom STIX 2 objects, see Custom STIX Content: Custom STIX Object Types
For retrieving STIX 2 content from a source (e.g. file system, TAXII) that may possibly have custom STIX 2 content unknown to the user, the user can create a STIX 2 DataStore/Source with the flag allow_custom=True
. As mentioned, this will configure the DataStore/Source to allow for unknown STIX 2 content to be returned (albeit not converted to full STIX 2 domain objects and properties); the stix2
library may preclude processing the unknown content, if the content is not valid or actual
STIX 2 domain objects and properties.
[ ]:
from taxii2client import Collection
from stix2 import CompositeDataSource, FileSystemSource, TAXIICollectionSource
# to allow for the retrieval of unknown custom STIX2 content,
# just create *Stores/*Sources with the 'allow_custom' flag
# create FileSystemStore
fs = FileSystemSource("/path/to/stix2_data/", allow_custom=True)
# create TAXIICollectionSource
colxn = Collection('http://taxii_url')
ts = TAXIICollectionSource(colxn, allow_custom=True)
STIX2 Patterns¶
The Python stix2
library supports STIX 2 patterning insofar that patterns may be used for the pattern property of Indicators, identical to the STIX 2 specification. stix2
does not evaluate patterns against STIX 2 content; for that functionality see cti-pattern-matcher.
Patterns in the stix2
library are built compositely from the bottom up, creating subcomponent expressions first before those at higher levels.
API Tips¶
ObservationExpression¶
Within the STIX 2 Patterning specification, Observation Expressions denote a complete expression to be evaluated against a discrete observation. In other words, an Observation Expression must be created to apply to a single Observation instance. This is further made clear by the visual brackets([]
) that encapsulate an Observation Expression. Thus, whatever sub expressions that are within the Observation Expression are meant to be matched against the same Observable instance.
This requirement manifests itself within the stix2
library via ObservationExpression
. When creating STIX 2 observation expressions, whenever the current expression is complete, wrap it with ObservationExpression()
. This allows the complete pattern expression - no matter its complexity - to be rendered as a proper specification-adhering string. *Note: When pattern expressions are added to Indicator objects, the expression objects are implicitly converted to string
representations*. While the extra step may seem tedious in the construction of simple pattern expressions, this explicit marking of observation expressions becomes vital when converting the pattern expressions to strings.
In all the examples, you can observe how in the process of building pattern expressions, when an Observation Expression is completed, it is wrapped with ObservationExpression()
.
ParentheticalExpression¶
Do not be confused by the ParentheticalExpression
object. It is not a distinct expression type but is also used to properly craft pattern expressions by denoting order priority and grouping of expression components. Use it in a similar manner as ObservationExpression
, wrapping completed subcomponent expressions with ParentheticalExpression()
if explicit ordering is required. For usage examples with ParentheticalExpression
’s, see here.
BooleanExpressions vs CompoundObservationExpressions¶
Be careful to note the difference between these two very similar pattern components.
BooleanExpressions
Usage: When the boolean sub-expressions refer to the same root object
Example: [domain-name:value = "www.5z8.info" AND domain-name:resolvess_to_refs[*].value = "'198.51.100.1/32'"]
Rendering: when pattern is rendered, brackets or parenthesis will encapsulate boolean expression
CompoundObservationExpressions
Usage: When the boolean sub-expressions refer to different root objects
Example: [file:name="foo.dll"] AND [process:name = "procfoo"]
Rendering: when pattern is rendered, brackets will encapsulate each boolean sub-expression
Examples¶
Comparison Expressions¶
[3]:
from stix2 import DomainName, File, IPv4Address
from stix2 import (ObjectPath, EqualityComparisonExpression, ObservationExpression,
GreaterThanComparisonExpression, IsSubsetComparisonExpression,
FloatConstant, StringConstant)
Equality Comparison expressions¶
[7]:
lhs = ObjectPath("domain-name", ["value"])
ece_1 = ObservationExpression(EqualityComparisonExpression(lhs, "site.of.interest.zaz"))
print("\t{}\n".format(ece_1))
lhs = ObjectPath("file", ["parent_directory_ref","path"])
ece_2 = ObservationExpression(EqualityComparisonExpression(lhs, "C:\\Windows\\System32"))
print("\t{}\n".format(ece_2))
[domain-name:value = 'site.of.interest.zaz']
[file:parent_directory_ref.path = 'C:\\Windows\\System32']
Greater-than Comparison expressions¶
[5]:
lhs = ObjectPath("file", ["extensions", "windows-pebinary-ext", "sections[*]", "entropy"])
gte = ObservationExpression(GreaterThanComparisonExpression(lhs, FloatConstant("7.0")))
print("\t{}\n".format(gte))
[file:extensions.windows-pebinary-ext.sections[*].entropy > 7.0]
IsSubset Comparison expressions¶
[6]:
lhs = ObjectPath("network-traffic", ["dst_ref", "value"])
iss = ObservationExpression(IsSubsetComparisonExpression(lhs, StringConstant("2001:0db8:dead:beef:0000:0000:0000:0000/64")))
print("\t{}\n".format(iss))
[network-traffic:dst_ref.value ISSUBSET '2001:0db8:dead:beef:0000:0000:0000:0000/64']
Compound Observation Expressions¶
[1]:
from stix2 import (IntegerConstant, HashConstant, ObjectPath,
EqualityComparisonExpression, AndBooleanExpression,
OrBooleanExpression, ParentheticalExpression,
AndObservationExpression, OrObservationExpression,
FollowedByObservationExpression, ObservationExpression)
AND boolean¶
[3]:
ece3 = EqualityComparisonExpression(ObjectPath("email-message", ["sender_ref", "value"]), "stark@example.com")
ece4 = EqualityComparisonExpression(ObjectPath("email-message", ["subject"]), "Conference Info")
abe = ObservationExpression(AndBooleanExpression([ece3, ece4]))
print("(AND)\n{}\n".format(abe))
(AND)
[email-message:sender_ref.value = 'stark@example.com' AND email-message:subject = 'Conference Info']
OR boolean¶
[4]:
ece5 = EqualityComparisonExpression(ObjectPath("url", ["value"]), "http://example.com/foo")
ece6 = EqualityComparisonExpression(ObjectPath("url", ["value"]), "http://example.com/bar")
obe = ObservationExpression(OrBooleanExpression([ece5, ece6]))
print("(OR)\n{}\n".format(obe))
(OR)
[url:value = 'http://example.com/foo' OR url:value = 'http://example.com/bar']
( OR ) AND boolean¶
[5]:
ece7 = EqualityComparisonExpression(ObjectPath("file", ["name"]), "pdf.exe")
ece8 = EqualityComparisonExpression(ObjectPath("file", ["size"]), IntegerConstant("371712"))
ece9 = EqualityComparisonExpression(ObjectPath("file", ["created"]), "2014-01-13T07:03:17Z")
obe1 = OrBooleanExpression([ece7, ece8])
pobe = ParentheticalExpression(obe1)
abe1 = ObservationExpression(AndBooleanExpression([pobe, ece9]))
print("(OR,AND)\n{}\n".format(abe1))
(OR,AND)
[(file:name = 'pdf.exe' OR file:size = 371712) AND file:created = 2014-01-13 07:03:17+00:00]
( AND ) OR ( OR ) observation¶
[6]:
ece20 = ObservationExpression(EqualityComparisonExpression(ObjectPath("file", ["name"]), "foo.dll"))
ece21 = ObservationExpression(EqualityComparisonExpression(ObjectPath("win-registry-key", ["key"]), "HKEY_LOCAL_MACHINE\\foo\\bar"))
ece22 = EqualityComparisonExpression(ObjectPath("process", ["name"]), "fooproc")
ece23 = EqualityComparisonExpression(ObjectPath("process", ["name"]), "procfoo")
# NOTE: we need to use AND/OR observation expression instead of just boolean
# expressions as the operands are not on the same object-type
aoe = ParentheticalExpression(AndObservationExpression([ece20, ece21]))
obe2 = ObservationExpression(OrBooleanExpression([ece22, ece23]))
ooe = OrObservationExpression([aoe, obe2])
print("(AND,OR,OR)\n{}\n".format(ooe))
(AND,OR,OR)
([file:name = 'foo.dll'] AND [win-registry-key:key = 'HKEY_LOCAL_MACHINE\\foo\\bar']) OR [process:name = 'fooproc' OR process:name = 'procfoo']
FOLLOWED-BY¶
[7]:
ece10 = ObservationExpression(EqualityComparisonExpression(ObjectPath("file", ["hashes", "MD5"]), HashConstant("79054025255fb1a26e4bc422aef54eb4", "MD5")))
ece11 = ObservationExpression(EqualityComparisonExpression(ObjectPath("win-registry-key", ["key"]), "HKEY_LOCAL_MACHINE\\foo\\bar"))
fbe = FollowedByObservationExpression([ece10, ece11])
print("(FollowedBy)\n{}\n".format(fbe))
(FollowedBy)
[file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4'] FOLLOWEDBY [win-registry-key:key = 'HKEY_LOCAL_MACHINE\\foo\\bar']
Qualified Observation Expressions¶
[8]:
from stix2 import (TimestampConstant, HashConstant, ObjectPath, EqualityComparisonExpression,
AndBooleanExpression, WithinQualifier, RepeatQualifier, StartStopQualifier,
QualifiedObservationExpression, FollowedByObservationExpression,
ParentheticalExpression, ObservationExpression)
WITHIN¶
[9]:
ece10 = ObservationExpression(EqualityComparisonExpression(ObjectPath("file", ["hashes", "MD5"]), HashConstant("79054025255fb1a26e4bc422aef54eb4", "MD5")))
ece11 = ObservationExpression(EqualityComparisonExpression(ObjectPath("win-registry-key", ["key"]), "HKEY_LOCAL_MACHINE\\foo\\bar"))
fbe = FollowedByObservationExpression([ece10, ece11])
par = ParentheticalExpression(fbe)
qoe = QualifiedObservationExpression(par, WithinQualifier(300))
print("(WITHIN)\n{}\n".format(qoe))
(WITHIN)
([file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4'] FOLLOWEDBY [win-registry-key:key = 'HKEY_LOCAL_MACHINE\\foo\\bar']) WITHIN 300 SECONDS
REPEATS, WITHIN¶
[10]:
ece12 = EqualityComparisonExpression(ObjectPath("network-traffic", ["dst_ref", "type"]), "domain-name")
ece13 = EqualityComparisonExpression(ObjectPath("network-traffic", ["dst_ref", "value"]), "example.com")
abe2 = ObservationExpression(AndBooleanExpression([ece12, ece13]))
qoe1 = QualifiedObservationExpression(QualifiedObservationExpression(abe2, RepeatQualifier(5)), WithinQualifier(180))
print("(REPEAT, WITHIN)\n{}\n".format(qoe1))
(REPEAT, WITHIN)
[network-traffic:dst_ref.type = 'domain-name' AND network-traffic:dst_ref.value = 'example.com'] REPEATS 5 TIMES WITHIN 180 SECONDS
START, STOP¶
[11]:
ece14 = ObservationExpression(EqualityComparisonExpression(ObjectPath("file", ["name"]), "foo.dll"))
ssq = StartStopQualifier(TimestampConstant('2016-06-01T00:00:00Z'), TimestampConstant('2016-07-01T00:00:00Z'))
qoe2 = QualifiedObservationExpression(ece14, ssq)
print("(START-STOP)\n{}\n".format(qoe2))
(START-STOP)
[file:name = 'foo.dll'] START t'2016-06-01T00:00:00Z' STOP t'2016-07-01T00:00:00Z'
Attaching patterns to STIX2 Domain objects¶
Example¶
[10]:
from stix2 import Indicator, EqualityComparisonExpression, ObservationExpression
ece14 = ObservationExpression(EqualityComparisonExpression(ObjectPath("file", ["name"]), "$$t00rzch$$.elf"))
ind = Indicator(name="Cryptotorch", labels=["malware", "ransomware"], pattern=ece14)
print(ind)
{
"type": "indicator",
"id": "indicator--219bc5fc-fdbf-4b54-a2fc-921be7ab3acb",
"created": "2018-08-29T23:58:00.548Z",
"modified": "2018-08-29T23:58:00.548Z",
"name": "Cryptotorch",
"pattern": "[file:name = '$$t00rzch$$.elf']",
"valid_from": "2018-08-29T23:58:00.548391Z",
"labels": [
"malware",
"ransomware"
]
}
Serializing STIX Objects¶
The string representation of all STIX classes is a valid STIX JSON object.
[3]:
from stix2 import Indicator
indicator = Indicator(name="File hash for malware variant",
labels=["malicious-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
print(str(indicator))
[3]:
{
"type": "indicator",
"id": "indicator--4336ace8-d985-413a-8e32-f749ba268dc3",
"created": "2018-04-05T20:01:20.012Z",
"modified": "2018-04-05T20:01:20.012Z",
"name": "File hash for malware variant",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2018-04-05T20:01:20.012209Z",
"labels": [
"malicious-activity"
]
}
However, the string representation can be slow, as it sorts properties to be in a more readable order. If you need performance and don’t care about the human-readability of the output, use the object’s serialize()
function:
[4]:
print(indicator.serialize())
[4]:
{"name": "File hash for malware variant", "labels": ["malicious-activity"], "pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']", "type": "indicator", "id": "indicator--4336ace8-d985-413a-8e32-f749ba268dc3", "created": "2018-04-05T20:01:20.012Z", "modified": "2018-04-05T20:01:20.012Z", "valid_from": "2018-04-05T20:01:20.012209Z"}
If you need performance but also need human-readable output, you can pass the indent
keyword argument to serialize()
:
[5]:
print(indicator.serialize(indent=4))
[5]:
{
"name": "File hash for malware variant",
"labels": [
"malicious-activity"
],
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"type": "indicator",
"id": "indicator--4336ace8-d985-413a-8e32-f749ba268dc3",
"created": "2018-04-05T20:01:20.012Z",
"modified": "2018-04-05T20:01:20.012Z",
"valid_from": "2018-04-05T20:01:20.012209Z"
}
The only difference between this and the string representation from using str()
is that this will not sort the keys. This works because the keyword arguments are passed to json.dumps()
internally.
TAXIICollection¶
The TAXIICollection suite contains TAXIICollectionStore, TAXIICollectionSource, and TAXIICollectionSink. TAXIICollectionStore pushes and retrieves STIX content to local/remote TAXII Collection(s). TAXIICollectionSource retrieves STIX content from local/remote TAXII Collection(s). TAXIICollectionSink pushes STIX content to local/remote TAXII Collection(s). Each of the interfaces is designed to be bound to a Collection from the taxii2client library (taxii2client.Collection), where all TAXIICollection API calls will be executed through that Collection instance.
A note on TAXII2 searching/filtering of STIX content: TAXII2 server implementations natively support searching on the STIX2 object properties: id, type and version; API requests made to TAXII2 can contain filter arguments for those 3 properties. However, the TAXIICollection suite supports searching on all STIX2 common object properties (see Filters documentation for full listing). This works simply by augmenting the filtering that is done remotely at the TAXII2 server instance. TAXIICollection will seperate any supplied queries into TAXII supported filters and non-supported filters. During a TAXIICollection API call, TAXII2 supported filters get inserted into the TAXII2 server request (to be evaluated at the server). The rest of the filters are kept locally and then applied to the STIX2 content that is returned from the TAXII2 server, before being returned from the TAXIICollection API call.
TAXIICollection API¶
TAXIICollection Examples¶
TAXIICollectionSource¶
[18]:
from stix2 import TAXIICollectionSource
from taxii2client import Collection
# establish TAXII2 Collection instance
collection = Collection("http://127.0.0.1:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/", user="admin", password="Password0")
# supply the TAXII2 collection to TAXIICollection
tc_source = TAXIICollectionSource(collection)
#retrieve STIX objects by id
stix_obj = tc_source.get("malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111")
stix_obj_versions = tc_source.all_versions("indicator--a932fcc6-e032-476c-826f-cb970a5a1ade")
#for visual purposes
print(stix_obj)
print("-------")
for so in stix_obj_versions:
print(so)
{
"type": "malware",
"id": "malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111",
"created": "2017-01-27T13:49:53.997Z",
"modified": "2017-01-27T13:49:53.997Z",
"name": "Poison Ivy",
"description": "Poison Ivy",
"labels": [
"remote-access-trojan"
]
}
-------
{
"type": "indicator",
"id": "indicator--a932fcc6-e032-476c-826f-cb970a5a1ade",
"created": "2014-05-08T09:00:00.000Z",
"modified": "2014-05-08T09:00:00.000Z",
"name": "File hash for Poison Ivy variant",
"pattern": "[file:hashes.'SHA-256' = 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c']",
"valid_from": "2014-05-08T09:00:00Z",
"labels": [
"file-hash-watchlist"
]
}
[20]:
from stix2 import Filter
# retrieve multiple object from TAXIICollectionSource
# by using filters
f1 = Filter("type","=", "indicator")
indicators = tc_source.query([f1])
#for visual purposes
for indicator in indicators:
print(indicator)
{
"type": "indicator",
"id": "indicator--a932fcc6-e032-476c-826f-cb970a5a1ade",
"created": "2014-05-08T09:00:00.000Z",
"modified": "2014-05-08T09:00:00.000Z",
"name": "File hash for Poison Ivy variant",
"pattern": "[file:hashes.'SHA-256' = 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c']",
"valid_from": "2014-05-08T09:00:00Z",
"labels": [
"file-hash-watchlist"
]
}
TAXIICollectionSink¶
[ ]:
from stix2 import TAXIICollectionSink, ThreatActor
#create TAXIICollectionSINK and push STIX content to it
tc_sink = TAXIICollectionSink(collection)
# create new STIX threat-actor
ta = ThreatActor(name="Teddy Bear",
labels=["nation-state"],
sophistication="innovator",
resource_level="government",
goals=[
"compromising environment NGOs",
"water-hole attacks geared towards energy sector",
])
tc_sink.add(ta)
TAXIICollectionStore¶
[19]:
from stix2 import TAXIICollectionStore
# create TAXIICollectionStore - note the same collection instance can
# be used for the store
tc_store = TAXIICollectionStore(collection)
# retrieve STIX object by id from TAXII Collection through
# TAXIICollectionStore
stix_obj2 = tc_source.get("malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111")
print(stix_obj2)
{
"type": "malware",
"id": "malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111",
"created": "2017-01-27T13:49:53.997Z",
"modified": "2017-01-27T13:49:53.997Z",
"name": "Poison Ivy",
"description": "Poison Ivy",
"labels": [
"remote-access-trojan"
]
}
[ ]:
from stix2 import indicator
# add STIX object to TAXIICollectionStore
ind = Indicator(description="Smokey Bear implant",
labels=["malicious-activity"],
pattern="[file:hashes.'SHA-256' = '09c7e05a39a59428743635242e4a867c932140a909f12a1e54fa7ee6a440c73b']")
tc_store.add(ind)
Bug and Workaround¶
You may get an error similar to the following when adding STIX objects to a TAXIICollectionStore or TAXIICollectionSink:
TypeError: Object of type ThreatActor is not JSON serializable
This is a known bug and we are working to fix it. For more information, see this GitHub issue In the meantime, try this workaround:
[ ]:
tc_sink.add(json.loads(Bundle(ta).serialize()))
Or bypass the TAXIICollection altogether and interact with the collection itself:
[ ]:
collection.add_objects(json.loads(Bundle(ta).serialize()))
Technical Specification Support¶
How imports work¶
Imports can be used in different ways depending on the use case and support levels.
People who want to support the latest version of STIX 2.X without having to make changes, can implicitly use the latest version:
[ ]:
import stix2
stix2.Indicator()
or,
[ ]:
from stix2 import Indicator
Indicator()
People who want to use an explicit version:
[ ]:
import stix2.v20
stix2.v20.Indicator()
or,
[ ]:
from stix2.v20 import Indicator
Indicator()
or even,
[ ]:
import stix2.v20 as stix2
stix2.Indicator()
The last option makes it easy to update to a new version in one place per file, once you’ve made the deliberate action to do this.
People who want to use multiple versions in a single file:
[ ]:
import stix2
stix2.v20.Indicator()
stix2.v21.Indicator()
or,
[ ]:
from stix2 import v20, v21
v20.Indicator()
v21.Indicator()
or (less preferred):
[ ]:
from stix2.v20 import Indicator as Indicator_v20
from stix2.v21 import Indicator as Indicator_v21
Indicator_v20()
Indicator_v21()
How parsing works¶
If the version
positional argument is not provided. The library will make the best attempt using the “spec_version” property found on a Bundle, SDOs, and SROs.
You can lock your parse() method to a specific STIX version by:
[2]:
from stix2 import parse
indicator = parse("""{
"type": "indicator",
"id": "indicator--dbcbd659-c927-4f9a-994f-0a2632274394",
"created": "2017-09-26T23:33:39.829Z",
"modified": "2017-09-26T23:33:39.829Z",
"labels": [
"malicious-activity"
],
"name": "File hash for malware variant",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2017-09-26T23:33:39.829952Z"
}""", version="2.0")
print(indicator)
[2]:
{
"type": "indicator",
"id": "indicator--dbcbd659-c927-4f9a-994f-0a2632274394",
"created": "2017-09-26T23:33:39.829Z",
"modified": "2017-09-26T23:33:39.829Z",
"name": "File hash for malware variant",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2017-09-26T23:33:39.829952Z",
"labels": [
"malicious-activity"
]
}
Keep in mind that if a 2.1 or higher object is parsed, the operation will fail.
How custom content works¶
CustomObject, CustomObservable, CustomMarking and CustomExtension must be registered explicitly by STIX version. This is a design decision since properties or requirements may change as the STIX Technical Specification advances.
You can perform this by:
[ ]:
import stix2
# Make my custom observable available in STIX 2.0
@stix2.v20.CustomObservable('x-new-object-type',
(("prop", stix2.properties.BooleanProperty())))
class NewObject2(object):
pass
# Make my custom observable available in STIX 2.1
@stix2.v21.CustomObservable('x-new-object-type',
(("prop", stix2.properties.BooleanProperty())))
class NewObject2(object):
pass
Versioning¶
To create a new version of an existing object, specify the property(ies) you want to change and their new values. For example, here we change the label from “anomalous-activity” to “malicious-activity”:
[3]:
from stix2 import Indicator
indicator = Indicator(created="2016-01-01T08:00:00.000Z",
name="File hash for suspicious file",
description="A file indicator",
labels=["anomalous-activity"],
pattern="[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']")
indicator2 = indicator.new_version(name="File hash for Foobar malware",
labels=["malicious-activity"])
print(indicator2)
[3]:
{
"type": "indicator",
"id": "indicator--8ad18fc7-457c-475d-b292-1ec44febe0fd",
"created": "2016-01-01T08:00:00.000Z",
"modified": "2019-07-25T17:59:34.815Z",
"name": "File hash for Foobar malware",
"description": "A file indicator",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-07-25T17:59:34.779826Z",
"labels": [
"malicious-activity"
]
}
The modified time will be updated to the current time unless you provide a specific value as a keyword argument. Note that you can’t change the type
, id
, or created
properties.
[4]:
indicator.new_version(id="indicator--cc42e358-8b9b-493c-9646-6ecd73b41c21")
UnmodifiablePropertyError: These properties cannot be changed when making a new version: id.
You can remove optional or custom properties by setting them to None
when you call new_version()
.
[5]:
indicator3 = indicator.new_version(description=None)
print(indicator3)
[5]:
{
"type": "indicator",
"id": "indicator--8ad18fc7-457c-475d-b292-1ec44febe0fd",
"created": "2016-01-01T08:00:00.000Z",
"modified": "2019-07-25T17:59:42.648Z",
"name": "File hash for suspicious file",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-07-25T17:59:34.779826Z",
"labels": [
"anomalous-activity"
]
}
To revoke an object:
[6]:
indicator4 = indicator3.revoke()
print(indicator4)
[6]:
{
"type": "indicator",
"id": "indicator--8ad18fc7-457c-475d-b292-1ec44febe0fd",
"created": "2016-01-01T08:00:00.000Z",
"modified": "2019-07-25T17:59:52.198Z",
"name": "File hash for suspicious file",
"pattern": "[file:hashes.md5 = 'd41d8cd98f00b204e9800998ecf8427e']",
"valid_from": "2019-07-25T17:59:34.779826Z",
"revoked": true,
"labels": [
"anomalous-activity"
]
}
Using The Workbench¶
The Workbench API hides most of the complexity of the rest of the library to make it easy to interact with STIX data. To use it, just import everything from stix2.workbench
:
[3]:
from stix2.workbench import *
Retrieving STIX Data¶
To get some STIX data to work with, let’s set up a DataSource and add it to our workbench.
[4]:
from taxii2client import Collection
collection = Collection("http://127.0.0.1:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/", user="admin", password="Password0")
tc_source = TAXIICollectionSource(collection)
add_data_source(tc_source)
Now we can get all of the indicators from the data source.
[5]:
response = indicators()
Similar functions are available for the other STIX Object types. See the full list here.
If you want to only retrieve some indicators, you can pass in one or more Filters. This example finds all the indicators created by a specific identity:
[6]:
response = indicators(filters=Filter('created_by_ref', '=', 'identity--adede3e8-bf44-4e6f-b3c9-1958cbc3b188'))
The objects returned let you easily traverse their relationships. Get all Relationship objects involving that object with .relationships()
, all other objects related to this object with .related()
, and the Identity object for the creator of the object (if one exists) with .created_by()
. For full details on these methods and their arguments, see the Workbench API documentation.
[7]:
for i in indicators():
for rel in i.relationships():
print(rel.source_ref)
print(rel.relationship_type)
print(rel.target_ref)
[7]:
indicator--a932fcc6-e032-476c-826f-cb970a5a1ade
[7]:
indicates
[7]:
malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111
[8]:
for i in indicators():
for obj in i.related():
print(obj)
[8]:
{
"type": "malware",
"id": "malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111",
"created": "2017-01-27T13:49:53.997Z",
"modified": "2017-01-27T13:49:53.997Z",
"name": "Poison Ivy",
"description": "Poison Ivy",
"labels": [
"remote-access-trojan"
]
}
If there are a lot of related objects, you can narrow it down by passing in one or more Filters just as before. For example, if we want to get only the indicators related to a specific piece of malware (and not any entities that use it or are targeted by it):
[9]:
malware = get('malware--fdd60b30-b67c-41e3-b0b9-f01faf20d111')
indicator = malware.related(filters=Filter('type', '=', 'indicator'))
print(indicator[0])
[9]:
{
"type": "indicator",
"id": "indicator--a932fcc6-e032-476c-826f-cb970a5a1ade",
"created": "2014-05-08T09:00:00.000Z",
"modified": "2014-05-08T09:00:00.000Z",
"name": "File hash for Poison Ivy variant",
"pattern": "[file:hashes.'SHA-256' = 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c']",
"valid_from": "2014-05-08T09:00:00Z",
"labels": [
"file-hash-watchlist"
]
}
Creating STIX Data¶
To create a STIX object, just use that object’s class constructor. Once it’s created, add it to the workbench with save().
[10]:
identity = Identity(name="ACME Threat Intel Co.", identity_class="organization")
save(identity)
You can also set defaults for certain properties when creating objects. For example, let’s set the default creator to be the identity object we just created:
[11]:
set_default_creator(identity)
Now when we create an indicator (or any other STIX Domain Object), it will automatically have the right create_by_ref
value.
[12]:
indicator = Indicator(labels=["malicious-activity"], pattern="[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']")
save(indicator)
indicator_creator = get(indicator.created_by_ref)
print(indicator_creator.name)
[12]:
ACME Threat Intel Co.
Defaults can also be set for the created timestamp, external references and object marking references.
Warning:
The workbench layer replaces STIX Object classes with special versions of them that use “wrappers” to provide extra functionality. Because of this, we recommend that you either use the workbench layer or the rest of the library, but not both. In other words, don’t import from both stix2.workbench
and any other submodules of stix2
.
API Reference¶
This section of documentation contains information on all of the classes and
functions in the stix2
API, as given by the package’s docstrings.
Note
All the classes and functions detailed in the pages below are importable directly from stix2. See also: How imports work.
Python APIs for STIX 2.
confidence |
Functions to operate with STIX2 Confidence scales. |
core |
STIX2 Core Objects and Methods. |
datastore |
Python STIX2 DataStore API. |
environment |
Python STIX2 Environment API. |
exceptions |
STIX2 Error Classes. |
markings |
Functions for working with STIX 2 Data Markings. |
patterns |
Classes to aid in working with the STIX 2 patterning language. |
properties |
Classes for representing properties of STIX Objects and Cyber Observables. |
utils |
Utility functions and classes for the STIX2 library. |
v20 |
STIX 2.0 API Objects. |
v21 |
STIX 2.1 API Objects. |
workbench |
Functions and class wrappers for interacting with STIX2 data at a high level. |
Contributing¶
We’re thrilled that you’re interested in contributing to python-stix2! Here are some things you should know:
- contribution-guide.org has great ideas for contributing to any open-source project (not just python-stix2).
- All contributors must sign a Contributor License Agreement. See CONTRIBUTING.md in the project repository for specifics.
- If you are planning to implement a major feature (vs. fixing a bug), please discuss with a project maintainer first to ensure you aren’t duplicating the work of someone else, and that the feature is likely to be accepted.
Now, let’s get started!
Setting up a development environment¶
We recommend using a virtualenv.
1. Clone the repository. If you’re planning to make pull request, you should fork the repository on GitHub and clone your fork instead of the main repo:
git clone https://github.com/yourusername/cti-python-stix2.git
- Install develoment-related dependencies:
cd cti-python-stix2
pip install -r requirements.txt
- Install pre-commit git hooks:
pre-commit install
At this point you should be able to make changes to the code.
Code style¶
All code should follow PEP 8. We allow for line lengths up to 160 characters, but any lines over 80 characters should be the exception rather than the rule. PEP 8 conformance will be tested automatically by Tox and Travis-CI (see below).
Testing¶
Note
All of the tools mentioned in this section are installed when you run pip
install -r requirements.txt
.
python-stix2 uses pytest for testing. We encourage the use of test-driven development (TDD), where you write (failing) tests that demonstrate a bug or proposed new feature before writing code that fixes the bug or implements the features. Any code contributions to python-stix2 should come with new or updated tests.
To run the tests in your current Python environment, use the pytest
command
from the root project directory:
pytest
This should show all of the tests that ran, along with their status.
You can run a specific test file by passing it on the command line:
pytest stix2/test/test_<xxx>.py
To ensure that the test you wrote is running, you can deliberately add an
assert False
statement at the beginning of the test. This is another benefit
of TDD, since you should be able to see the test failing (and ensure it’s being
run) before making it pass.
tox allows you to test a package across multiple versions of Python. Setting up multiple Python environments is beyond the scope of this guide, but feel free to ask for help setting them up. Tox should be run from the root directory of the project:
tox
We aim for high test coverage, using the coverage.py library. Though it’s not an absolute requirement to maintain 100% coverage, all code contributions must be accompanied by tests. To run coverage and look for untested lines of code, run:
pytest --cov=stix2
coverage html
then look at the resulting report in htmlcov/index.html
.
All commits pushed to the master
branch or submitted as a pull request are
tested with Travis-CI
automatically.
Adding a dependency¶
One of the pre-commit hooks we use in our develoment environment enforces a consistent ordering to imports. If you need to add a new library as a dependency please add it to the known_third_party section of .isort.cfg to make sure the import is sorted correctly.