object

Python APIs for STIX 2 Object-based Semantic Equivalence and Similarity.

check_property_present(prop, obj1, obj2)

Helper method checks if a property is present on both objects.

custom_pattern_based(pattern1, pattern2)

Performs a matching on Indicator Patterns.

Parameters:
  • pattern1 – An Indicator pattern
  • pattern2 – An Indicator pattern
Returns:

float – Number between 0.0 and 1.0 depending on match criteria.

exact_match(val1, val2)

Performs an exact value match based on two values. This method can be used for _ref equality check when de-reference is not possible.

Parameters:
  • val1 – A value suitable for an equality test.
  • val2 – A value suitable for an equality test.
Returns:

float – 1.0 if the value matches exactly, 0.0 otherwise.

list_reference_check(refs1, refs2, ds1, ds2, **weights)

For objects that contain multiple references (i.e., object_refs) perform the same de-reference procedure and perform object_similarity. The score influences the objects containing these references. The result is weighted on the amount of unique objects that could 1) be de-referenced 2)

object_equivalence(obj1, obj2, prop_scores={}, threshold=70, ds1=None, ds2=None, ignore_spec_version=False, versioning_checks=False, max_depth=1, **weight_dict)

This method returns a true/false value if two objects are semantically equivalent. Internally, it calls the object_similarity function and compares it against the given threshold value.

Parameters:
  • obj1 – A stix2 object instance
  • obj2 – A stix2 object instance
  • prop_scores – A dictionary that can hold individual property scores, weights, contributing score, matching score and sum of weights.
  • threshold – A numerical value between 0 and 100 to determine the minimum score to result in successfully calling both objects equivalent. This value can be tuned.
  • ds1 (optional) – A DataStore object instance from which to pull related objects
  • ds2 (optional) – A DataStore object instance from which to pull related objects
  • ignore_spec_version – A boolean indicating whether to test object types that belong to different spec versions (STIX 2.0 and STIX 2.1 for example). If set to True this check will be skipped.
  • versioning_checks – A boolean indicating whether to test multiple revisions of the same object (when present) to maximize similarity against a particular version. If set to True the algorithm will perform this step.
  • max_depth – A positive integer indicating the maximum recursion depth the algorithm can reach when de-referencing objects and performing the object_similarity algorithm.
  • weight_dict – A dictionary that can be used to override what checks are done to objects in the similarity process.
Returns:

bool

True if the result of the object similarity is greater than or equal to

the threshold value. False otherwise.

Warning

Object types need to have property weights defined for the similarity process. Otherwise, those objects will not influence the final score. The WEIGHTS dictionary under stix2.equivalence.object can give you an idea on how to add new entries and pass them via the weight_dict argument. Similarly, the values or methods can be fine tuned for a particular use case.

Note

Default weight_dict:

{
     "attack-pattern": {
         "name": [
             30,
             partial_string_based
         ],
         "external_references": [
             70,
             partial_external_reference_based
         ]
     },
     "campaign": {
         "name": [
             60,
             partial_string_based
         ],
         "aliases": [
             40,
             partial_list_based
         ]
     },
     "course-of-action": {
         "name": [
             60,
             partial_string_based
         ],
         "external_references": [
             40,
             partial_external_reference_based
         ]
     },
     "grouping": {
         "name": [
             20,
             partial_string_based
         ],
         "context": [
             20,
             partial_string_based
         ],
         "object_refs": [
             60,
             list_reference_check
         ]
     },
     "identity": {
         "name": [
             60,
             partial_string_based
         ],
         "identity_class": [
             20,
             exact_match
         ],
         "sectors": [
             20,
             partial_list_based
         ]
     },
     "incident": {
         "name": [
             30,
             partial_string_based
         ],
         "external_references": [
             70,
             partial_external_reference_based
         ]
     },
     "indicator": {
         "indicator_types": [
             15,
             partial_list_based
         ],
         "pattern": [
             80,
             custom_pattern_based
         ],
         "valid_from": [
             5,
             partial_timestamp_based
         ],
         "tdelta": 1
     },
     "intrusion-set": {
         "name": [
             20,
             partial_string_based
         ],
         "external_references": [
             60,
             partial_external_reference_based
         ],
         "aliases": [
             20,
             partial_list_based
         ]
     },
     "location": {
         "longitude_latitude": [
             34,
             partial_location_distance
         ],
         "region": [
             33,
             exact_match
         ],
         "country": [
             33,
             exact_match
         ],
         "threshold": 1000.0
     },
     "malware": {
         "malware_types": [
             20,
             partial_list_based
         ],
         "name": [
             80,
             partial_string_based
         ]
     },
     "marking-definition": {
         "name": [
             20,
             exact_match
         ],
         "definition": [
             60,
             exact_match
         ],
         "definition_type": [
             20,
             exact_match
         ]
     },
     "relationship": {
         "relationship_type": [
             20,
             exact_match
         ],
         "source_ref": [
             40,
             reference_check
         ],
         "target_ref": [
             40,
             reference_check
         ]
     },
     "report": {
         "name": [
             30,
             partial_string_based
         ],
         "published": [
             10,
             partial_timestamp_based
         ],
         "object_refs": [
             60,
             list_reference_check
         ],
         "tdelta": 1
     },
     "sighting": {
         "first_seen": [
             5,
             partial_timestamp_based
         ],
         "last_seen": [
             5,
             partial_timestamp_based
         ],
         "sighting_of_ref": [
             40,
             reference_check
         ],
         "observed_data_refs": [
             20,
             list_reference_check
         ],
         "where_sighted_refs": [
             20,
             list_reference_check
         ],
         "summary": [
             10,
             exact_match
         ]
     },
     "threat-actor": {
         "name": [
             60,
             partial_string_based
         ],
         "threat_actor_types": [
             20,
             partial_list_based
         ],
         "aliases": [
             20,
             partial_list_based
         ]
     },
     "tool": {
         "tool_types": [
             20,
             partial_list_based
         ],
         "name": [
             80,
             partial_string_based
         ]
     },
     "vulnerability": {
         "name": [
             30,
             partial_string_based
         ],
         "external_references": [
             70,
             partial_external_reference_based
         ]
     }
 }

Note

This implementation follows the Semantic Equivalence Committee Note. see the Committee Note.

object_similarity(obj1, obj2, prop_scores={}, ds1=None, ds2=None, ignore_spec_version=False, versioning_checks=False, max_depth=1, **weight_dict)

This method returns a measure of similarity depending on how similar the two objects are.

Parameters:
  • obj1 – A stix2 object instance
  • obj2 – A stix2 object instance
  • prop_scores – A dictionary that can hold individual property scores, weights, contributing score, matching score and sum of weights.
  • ds1 (optional) – A DataStore object instance from which to pull related objects
  • ds2 (optional) – A DataStore object instance from which to pull related objects
  • ignore_spec_version – A boolean indicating whether to test object types that belong to different spec versions (STIX 2.0 and STIX 2.1 for example). If set to True this check will be skipped.
  • versioning_checks – A boolean indicating whether to test multiple revisions of the same object (when present) to maximize similarity against a particular version. If set to True the algorithm will perform this step.
  • max_depth – A positive integer indicating the maximum recursion depth the algorithm can reach when de-referencing objects and performing the object_similarity algorithm.
  • weight_dict – A dictionary that can be used to override what checks are done to objects in the similarity process.
Returns:

float – A number between 0.0 and 100.0 as a measurement of similarity.

Warning

Object types need to have property weights defined for the similarity process. Otherwise, those objects will not influence the final score. The WEIGHTS dictionary under stix2.equivalence.object can give you an idea on how to add new entries and pass them via the weight_dict argument. Similarly, the values or methods can be fine tuned for a particular use case.

Note

Default weight_dict:

{
     "attack-pattern": {
         "name": [
             30,
             partial_string_based
         ],
         "external_references": [
             70,
             partial_external_reference_based
         ]
     },
     "campaign": {
         "name": [
             60,
             partial_string_based
         ],
         "aliases": [
             40,
             partial_list_based
         ]
     },
     "course-of-action": {
         "name": [
             60,
             partial_string_based
         ],
         "external_references": [
             40,
             partial_external_reference_based
         ]
     },
     "grouping": {
         "name": [
             20,
             partial_string_based
         ],
         "context": [
             20,
             partial_string_based
         ],
         "object_refs": [
             60,
             list_reference_check
         ]
     },
     "identity": {
         "name": [
             60,
             partial_string_based
         ],
         "identity_class": [
             20,
             exact_match
         ],
         "sectors": [
             20,
             partial_list_based
         ]
     },
     "incident": {
         "name": [
             30,
             partial_string_based
         ],
         "external_references": [
             70,
             partial_external_reference_based
         ]
     },
     "indicator": {
         "indicator_types": [
             15,
             partial_list_based
         ],
         "pattern": [
             80,
             custom_pattern_based
         ],
         "valid_from": [
             5,
             partial_timestamp_based
         ],
         "tdelta": 1
     },
     "intrusion-set": {
         "name": [
             20,
             partial_string_based
         ],
         "external_references": [
             60,
             partial_external_reference_based
         ],
         "aliases": [
             20,
             partial_list_based
         ]
     },
     "location": {
         "longitude_latitude": [
             34,
             partial_location_distance
         ],
         "region": [
             33,
             exact_match
         ],
         "country": [
             33,
             exact_match
         ],
         "threshold": 1000.0
     },
     "malware": {
         "malware_types": [
             20,
             partial_list_based
         ],
         "name": [
             80,
             partial_string_based
         ]
     },
     "marking-definition": {
         "name": [
             20,
             exact_match
         ],
         "definition": [
             60,
             exact_match
         ],
         "definition_type": [
             20,
             exact_match
         ]
     },
     "relationship": {
         "relationship_type": [
             20,
             exact_match
         ],
         "source_ref": [
             40,
             reference_check
         ],
         "target_ref": [
             40,
             reference_check
         ]
     },
     "report": {
         "name": [
             30,
             partial_string_based
         ],
         "published": [
             10,
             partial_timestamp_based
         ],
         "object_refs": [
             60,
             list_reference_check
         ],
         "tdelta": 1
     },
     "sighting": {
         "first_seen": [
             5,
             partial_timestamp_based
         ],
         "last_seen": [
             5,
             partial_timestamp_based
         ],
         "sighting_of_ref": [
             40,
             reference_check
         ],
         "observed_data_refs": [
             20,
             list_reference_check
         ],
         "where_sighted_refs": [
             20,
             list_reference_check
         ],
         "summary": [
             10,
             exact_match
         ]
     },
     "threat-actor": {
         "name": [
             60,
             partial_string_based
         ],
         "threat_actor_types": [
             20,
             partial_list_based
         ],
         "aliases": [
             20,
             partial_list_based
         ]
     },
     "tool": {
         "tool_types": [
             20,
             partial_list_based
         ],
         "name": [
             80,
             partial_string_based
         ]
     },
     "vulnerability": {
         "name": [
             30,
             partial_string_based
         ],
         "external_references": [
             70,
             partial_external_reference_based
         ]
     }
 }

Note

This implementation follows the Semantic Equivalence Committee Note. see the Committee Note.

partial_external_reference_based(ext_refs1, ext_refs2)

Performs a matching on External References.

Parameters:
  • ext_refs1 – A list of external references.
  • ext_refs2 – A list of external references.
Returns:

float – Number between 0.0 and 1.0 depending on matches.

partial_list_based(l1, l2)

Performs a partial list matching via finding the intersection between common values. Repeated values are counted only once. This method can be used for _refs equality checks when de-reference is not possible.

Parameters:
  • l1 – A list of values.
  • l2 – A list of values.
Returns:

float – 1.0 if the value matches exactly, 0.0 otherwise.

partial_location_distance(lat1, long1, lat2, long2, threshold)

Given two coordinates perform a matching based on its distance using the Haversine Formula.

Parameters:
  • lat1 – Latitude value for first coordinate point.
  • lat2 – Latitude value for second coordinate point.
  • long1 – Longitude value for first coordinate point.
  • long2 – Longitude value for second coordinate point.
  • threshold (float) – A kilometer measurement for the threshold distance between these two points.
Returns:

float – Number between 0.0 and 1.0 depending on match.

partial_string_based(str1, str2)

Performs a partial string match using the Jaro-Winkler distance algorithm.

Parameters:
  • str1 – A string value to check.
  • str2 – A string value to check.
Returns:

float – Number between 0.0 and 1.0 depending on match criteria.

partial_timestamp_based(t1, t2, tdelta)

Performs a timestamp-based matching via checking how close one timestamp is to another.

Parameters:
  • t1 – A datetime string or STIXdatetime object.
  • t2 – A datetime string or STIXdatetime object.
  • tdelta (float) – A given time delta. This number is multiplied by 86400 (1 day) to extend or shrink your time change tolerance.
Returns:

float – Number between 0.0 and 1.0 depending on match criteria.

reference_check(ref1, ref2, ds1, ds2, **weights)

For two references, de-reference the object and perform object_similarity. The score influences the result of an edge check.