graph¶
Python APIs for STIX 2 Graphbased Semantic Equivalence and Similarity.

graph_equivalence
(ds1, ds2, prop_scores={}, threshold=70, ignore_spec_version=False, versioning_checks=False, max_depth=1, **weight_dict)¶ This method returns a true/false value if two graphs are semantically equivalent. Internally, it calls the graph_similarity function and compares it against the given threshold value.
Parameters:  ds1 – A DataStore object instance representing your graph
 ds2 – A DataStore object instance representing your graph
 prop_scores – A dictionary that can hold individual property scores, weights, contributing score, matching score and sum of weights.
 threshold – A numerical value between 0 and 100 to determine the minimum score to result in successfully calling both graphs equivalent. This value can be tuned.
 ignore_spec_version – A boolean indicating whether to test object types that belong to different spec versions (STIX 2.0 and STIX 2.1 for example). If set to True this check will be skipped.
 versioning_checks – A boolean indicating whether to test multiple revisions of the same object (when present) to maximize similarity against a particular version. If set to True the algorithm will perform this step.
 max_depth – A positive integer indicating the maximum recursion depth the algorithm can reach when dereferencing objects and performing the object_similarity algorithm.
 weight_dict – A dictionary that can be used to override what checks are done to objects in the similarity process.
Returns: bool –
 True if the result of the graph similarity is greater than or equal to
the threshold value. False otherwise.
Warning
Object types need to have property weights defined for the similarity process. Otherwise, those objects will not influence the final score. The WEIGHTS dictionary under stix2.equivalence.graph can give you an idea on how to add new entries and pass them via the weight_dict argument. Similarly, the values or methods can be fine tuned for a particular use case.
Note
Default weight_dict:
{ "attackpattern": { "name": [ 30, partial_string_based ], "external_references": [ 70, partial_external_reference_based ] }, "campaign": { "name": [ 60, partial_string_based ], "aliases": [ 40, partial_list_based ] }, "courseofaction": { "name": [ 60, partial_string_based ], "external_references": [ 40, partial_external_reference_based ] }, "grouping": { "name": [ 20, partial_string_based ], "context": [ 20, partial_string_based ], "object_refs": [ 60, list_reference_check ] }, "identity": { "name": [ 60, partial_string_based ], "identity_class": [ 20, exact_match ], "sectors": [ 20, partial_list_based ] }, "incident": { "name": [ 30, partial_string_based ], "external_references": [ 70, partial_external_reference_based ] }, "indicator": { "indicator_types": [ 15, partial_list_based ], "pattern": [ 80, custom_pattern_based ], "valid_from": [ 5, partial_timestamp_based ], "tdelta": 1 }, "intrusionset": { "name": [ 20, partial_string_based ], "external_references": [ 60, partial_external_reference_based ], "aliases": [ 20, partial_list_based ] }, "location": { "longitude_latitude": [ 34, partial_location_distance ], "region": [ 33, exact_match ], "country": [ 33, exact_match ], "threshold": 1000.0 }, "malware": { "malware_types": [ 20, partial_list_based ], "name": [ 80, partial_string_based ] }, "markingdefinition": { "name": [ 20, exact_match ], "definition": [ 60, exact_match ], "definition_type": [ 20, exact_match ] }, "relationship": { "relationship_type": [ 20, exact_match ], "source_ref": [ 40, reference_check ], "target_ref": [ 40, reference_check ] }, "report": { "name": [ 30, partial_string_based ], "published": [ 10, partial_timestamp_based ], "object_refs": [ 60, list_reference_check ], "tdelta": 1 }, "sighting": { "first_seen": [ 5, partial_timestamp_based ], "last_seen": [ 5, partial_timestamp_based ], "sighting_of_ref": [ 40, reference_check ], "observed_data_refs": [ 20, list_reference_check ], "where_sighted_refs": [ 20, list_reference_check ], "summary": [ 10, exact_match ] }, "threatactor": { "name": [ 60, partial_string_based ], "threat_actor_types": [ 20, partial_list_based ], "aliases": [ 20, partial_list_based ] }, "tool": { "tool_types": [ 20, partial_list_based ], "name": [ 80, partial_string_based ] }, "vulnerability": { "name": [ 30, partial_string_based ], "external_references": [ 70, partial_external_reference_based ] } }
Note
This implementation follows the Semantic Equivalence Committee Note. see the Committee Note.

graph_similarity
(ds1, ds2, prop_scores={}, ignore_spec_version=False, versioning_checks=False, max_depth=1, **weight_dict)¶ This method returns a similarity score for two given graphs. Each DataStore can contain a connected or disconnected graph and the final result is weighted over the amount of objects we managed to compare. This approach builds on top of the objectbased similarity process and each comparison can return a value between 0 and 100.
Parameters:  ds1 – A DataStore object instance representing your graph
 ds2 – A DataStore object instance representing your graph
 prop_scores – A dictionary that can hold individual property scores, weights, contributing score, matching score and sum of weights.
 ignore_spec_version – A boolean indicating whether to test object types that belong to different spec versions (STIX 2.0 and STIX 2.1 for example). If set to True this check will be skipped.
 versioning_checks – A boolean indicating whether to test multiple revisions of the same object (when present) to maximize similarity against a particular version. If set to True the algorithm will perform this step.
 max_depth – A positive integer indicating the maximum recursion depth the algorithm can reach when dereferencing objects and performing the object_similarity algorithm.
 weight_dict – A dictionary that can be used to override what checks are done to objects in the similarity process.
Returns: float – A number between 0.0 and 100.0 as a measurement of similarity.
Warning
Object types need to have property weights defined for the similarity process. Otherwise, those objects will not influence the final score. The WEIGHTS dictionary under stix2.equivalence.graph can give you an idea on how to add new entries and pass them via the weight_dict argument. Similarly, the values or methods can be fine tuned for a particular use case.
Note
Default weight_dict:
{ "attackpattern": { "name": [ 30, partial_string_based ], "external_references": [ 70, partial_external_reference_based ] }, "campaign": { "name": [ 60, partial_string_based ], "aliases": [ 40, partial_list_based ] }, "courseofaction": { "name": [ 60, partial_string_based ], "external_references": [ 40, partial_external_reference_based ] }, "grouping": { "name": [ 20, partial_string_based ], "context": [ 20, partial_string_based ], "object_refs": [ 60, list_reference_check ] }, "identity": { "name": [ 60, partial_string_based ], "identity_class": [ 20, exact_match ], "sectors": [ 20, partial_list_based ] }, "incident": { "name": [ 30, partial_string_based ], "external_references": [ 70, partial_external_reference_based ] }, "indicator": { "indicator_types": [ 15, partial_list_based ], "pattern": [ 80, custom_pattern_based ], "valid_from": [ 5, partial_timestamp_based ], "tdelta": 1 }, "intrusionset": { "name": [ 20, partial_string_based ], "external_references": [ 60, partial_external_reference_based ], "aliases": [ 20, partial_list_based ] }, "location": { "longitude_latitude": [ 34, partial_location_distance ], "region": [ 33, exact_match ], "country": [ 33, exact_match ], "threshold": 1000.0 }, "malware": { "malware_types": [ 20, partial_list_based ], "name": [ 80, partial_string_based ] }, "markingdefinition": { "name": [ 20, exact_match ], "definition": [ 60, exact_match ], "definition_type": [ 20, exact_match ] }, "relationship": { "relationship_type": [ 20, exact_match ], "source_ref": [ 40, reference_check ], "target_ref": [ 40, reference_check ] }, "report": { "name": [ 30, partial_string_based ], "published": [ 10, partial_timestamp_based ], "object_refs": [ 60, list_reference_check ], "tdelta": 1 }, "sighting": { "first_seen": [ 5, partial_timestamp_based ], "last_seen": [ 5, partial_timestamp_based ], "sighting_of_ref": [ 40, reference_check ], "observed_data_refs": [ 20, list_reference_check ], "where_sighted_refs": [ 20, list_reference_check ], "summary": [ 10, exact_match ] }, "threatactor": { "name": [ 60, partial_string_based ], "threat_actor_types": [ 20, partial_list_based ], "aliases": [ 20, partial_list_based ] }, "tool": { "tool_types": [ 20, partial_list_based ], "name": [ 80, partial_string_based ] }, "vulnerability": { "name": [ 30, partial_string_based ], "external_references": [ 70, partial_external_reference_based ] } }
Note
This implementation follows the Semantic Equivalence Committee Note. see the Committee Note.