{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# Delete this cell to re-enable tracebacks\n", "import sys\n", "ipython = get_ipython()\n", "\n", "def hide_traceback(exc_tuple=None, filename=None, tb_offset=None,\n", " exception_only=False, running_compiled_code=False):\n", " etype, value, tb = sys.exc_info()\n", " value.__cause__ = None # suppress chained exceptions\n", " return ipython._showtraceback(etype, value, ipython.InteractiveTB.get_exception_only(etype, value))\n", "\n", "ipython.showtraceback = hide_traceback" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# JSON output syntax highlighting\n", "from __future__ import print_function\n", "from pygments import highlight\n", "from pygments.lexers import JsonLexer, TextLexer\n", "from pygments.formatters import HtmlFormatter\n", "from IPython.display import display, HTML\n", "from IPython.core.interactiveshell import InteractiveShell\n", "\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "def json_print(inpt):\n", " string = str(inpt)\n", " formatter = HtmlFormatter()\n", " if string[0] == '{':\n", " lexer = JsonLexer()\n", " else:\n", " lexer = TextLexer()\n", " return HTML('{}'.format(\n", " formatter.get_style_defs('.highlight'),\n", " highlight(string, lexer, formatter)))\n", "\n", "globals()['print'] = json_print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Object Similarity and Equivalence\n", "\n", "The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has functions for checking if two STIX Objects are very similar or identical. The functions differentiate between equivalence, which is a binary concept (two things are either equivalent or they are not), and similarity, which is a continuum (an object can be more similar to one object than to another). The similarity function answers the question, “How similar are these two objects?” while the equivalence function uses the similarity function to answer the question, “Are these two objects equivalent?”\n", "\n", "For each supported object type, the [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) function checks if the values for a specific set of properties match. Then each matching property is weighted since every property does not represent the same level of importance for semantic similarity. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the two objects are not equivalent, and a result of 100 means that they are equivalent. Values in between mean the two objects are more or less similar and can be used to determine if they should be considered equivalent or not. The [object_equivalence()](../api/stix2.environment.rst#stix2.environment.Environment.object_equivalence) calls [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) and compares the result to a threshold to determine if the objects are equivalent. Different organizations or users may use different thresholds.\n", "\n", "TODO: Add a link to the committee note when it is released.\n", "\n", "There are a number of use cases for which calculating semantic equivalence may be helpful. It can be used for echo detection, in which a STIX producer who consumes content from other producers wants to make sure they are not creating content they have already seen or consuming content they have already created.\n", "\n", "Another use case for this functionality is to identify identical or near-identical content, such as a vulnerability shared under three different nicknames by three different STIX producers. A third use case involves a feed that aggregates data from multiple other sources. It will want to make sure that it is not publishing duplicate data.\n", "\n", "Below we will show examples of the semantic similarity results of various objects. Unless otherwise specified, the ID of each object will be generated by the library, so the two objects will not have the same ID. This demonstrates that the semantic similarity algorithm only looks at specific properties for each object type. Each example also shows the result of calling the equivalence function, with a threshold value of `90`.\n", "\n", "**Please note** that you will need to install a few extra dependencies in order to use the semantic equivalence functions. You can do this using:\n", "\n", "```pip install stix2[semantic]```\n", "\n", "### Attack Pattern Example\n", "\n", "For Attack Patterns, the only properties that contribute to semantic similarity are `name` and `external_references`, with weights of 30 and 70, respectively. In this example, both attack patterns have the same external reference but the second has a slightly different yet still similar name." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
91.81818181818181\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
True\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import stix2\n", "from stix2 import AttackPattern, Environment, MemoryStore\n", "\n", "env = Environment(store=MemoryStore())\n", "\n", "ap1 = AttackPattern(\n", " name=\"Phishing\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example2\",\n", " \"source_name\": \"some-source2\",\n", " },\n", " ],\n", ")\n", "ap2 = AttackPattern(\n", " name=\"Spear phishing\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example2\",\n", " \"source_name\": \"some-source2\",\n", " },\n", " ],\n", ")\n", "print(env.object_similarity(ap1, ap2))\n", "print(env.object_equivalence(ap1, ap2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Campaign Example\n", "\n", "For Campaigns, the only properties that contribute to semantic similarity are `name` and `aliases`, with weights of 60 and 40, respectively. In this example, the two campaigns have completely different names, but slightly similar descriptions." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
30.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
False\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Campaign\n", "\n", "c1 = Campaign(\n", " name=\"Someone Attacks Somebody\",)\n", "\n", "c2 = Campaign(\n", " name=\"Another Campaign\",)\n", "print(env.object_similarity(c1, c2))\n", "print(env.object_equivalence(c1, c2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Identity Example\n", "\n", "For Identities, the only properties that contribute to semantic similarity are `name`, `identity_class`, and `sectors`, with weights of 60, 20, and 20, respectively. In this example, the two identities are identical, but are missing one of the contributing properties. The algorithm only compares properties that are actually present on the objects. Also note that they have completely different description properties, but because description is not one of the properties considered for semantic similarity, this difference has no effect on the result." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
True\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Identity\n", "\n", "id1 = Identity(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", " description=\"Just some guy\",\n", ")\n", "id2 = Identity(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", " description=\"A person\",\n", ")\n", "print(env.object_similarity(id1, id2))\n", "print(env.object_equivalence(id1, id2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indicator Example\n", "\n", "For Indicators, the only properties that contribute to semantic similarity are `indicator_types`, `pattern`, and `valid_from`, with weights of 15, 80, and 5, respectively. In this example, the two indicators have patterns with different hashes but the same indicator_type and valid_from. For patterns, the algorithm currently only checks if they are identical." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
20.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
False\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Indicator\n", "\n", "ind1 = Indicator(\n", " indicator_types=['malicious-activity'],\n", " pattern_type=\"stix\",\n", " pattern=\"[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']\",\n", " valid_from=\"2017-01-01T12:34:56Z\",\n", ")\n", "ind2 = Indicator(\n", " indicator_types=['malicious-activity'],\n", " pattern_type=\"stix\",\n", " pattern=\"[file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4']\",\n", " valid_from=\"2017-01-01T12:34:56Z\",\n", ")\n", "print(env.object_similarity(ind1, ind2))\n", "print(env.object_equivalence(ind1, ind2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the patterns were identical the result would have been 100." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Location Example\n", "\n", "For Locations, the only properties that contribute to semantic similarity are `longitude`/`latitude`, `region`, and `country`, with weights of 34, 33, and 33, respectively. In this example, the two locations are Washington, D.C. and New York City. The algorithm computes the distance between two locations using the haversine formula and uses that to influence similarity." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
67.20663955882583\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
False\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Location\n", "\n", "loc1 = Location(\n", " latitude=38.889,\n", " longitude=-77.023,\n", ")\n", "loc2 = Location(\n", " latitude=40.713,\n", " longitude=-74.006,\n", ")\n", "print(env.object_similarity(loc1, loc2))\n", "print(env.object_equivalence(loc1, loc2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Malware Example\n", "\n", "For Malware, the only properties that contribute to semantic similarity are `malware_types` and `name`, with weights of 20 and 80, respectively. In this example, the two malware objects only differ in the strings in their malware_types lists. For lists, the algorithm bases its calculations on the intersection of the two lists. An empty intersection will result in a 0, and a complete intersection will result in a 1 for that property." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
90.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
True\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Malware\n", "\n", "MALWARE_ID = \"malware--9c4638ec-f1de-4ddb-abf4-1b760417654e\"\n", "\n", "mal1 = Malware(id=MALWARE_ID,\n", " malware_types=['ransomware'],\n", " name=\"Cryptolocker\",\n", " is_family=False,\n", " )\n", "mal2 = Malware(id=MALWARE_ID,\n", " malware_types=['ransomware', 'dropper'],\n", " name=\"Cryptolocker\",\n", " is_family=False,\n", " )\n", "print(env.object_similarity(mal1, mal2))\n", "print(env.object_equivalence(mal1, mal2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Threat Actor Example\n", "\n", "For Threat Actors, the only properties that contribute to semantic similarity are `threat_actor_types`, `name`, and `aliases`, with weights of 20, 60, and 20, respectively. In this example, the two threat actors have the same id properties but everything else is different. Since the id property does not factor into semantic similarity, the result is not very high. The result is not zero because of the \"Token Sort Ratio\" algorithm used to compare the `name` property." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
6.66666666666667\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
False\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import ThreatActor\n", "\n", "THREAT_ACTOR_ID = \"threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f\"\n", "\n", "ta1 = ThreatActor(id=THREAT_ACTOR_ID,\n", " threat_actor_types=[\"crime-syndicate\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", ")\n", "ta2 = ThreatActor(id=THREAT_ACTOR_ID,\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", ")\n", "print(env.object_similarity(ta1, ta2))\n", "print(env.object_equivalence(ta1, ta2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tool Example\n", "\n", "For Tools, the only properties that contribute to semantic similarity are `tool_types` and `name`, with weights of 20 and 80, respectively. In this example, the two tools have the same values for properties that contribute to semantic similarity but one has an additional, non-contributing property." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
True\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Tool\n", "\n", "t1 = Tool(\n", " tool_types=[\"remote-access\"],\n", " name=\"VNC\",\n", ")\n", "t2 = Tool(\n", " tool_types=[\"remote-access\"],\n", " name=\"VNC\",\n", " description=\"This is a tool\"\n", ")\n", "print(env.object_similarity(t1, t2))\n", "print(env.object_equivalence(t1, t2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vulnerability Example\n", "\n", "For Vulnerabilities, the only properties that contribute to semantic similarity are `name` and `external_references`, with weights of 30 and 70, respectively. In this example, the two vulnerabilities have the same name but one also has an external reference. The algorithm doesn't take into account any semantic similarity contributing properties that are not present on both objects." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
True\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Vulnerability\n", "\n", "vuln1 = Vulnerability(\n", " name=\"Heartbleed\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example\",\n", " \"source_name\": \"some-source\",\n", " },\n", " ],\n", ")\n", "vuln2 = Vulnerability(\n", " name=\"Heartbleed\",\n", ")\n", "print(env.object_similarity(vuln1, vuln2))\n", "print(env.object_equivalence(vuln1, vuln2, threshold=90))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other Examples\n", "\n", "Comparing objects of different types will result in a `ValueError`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "The objects to compare must be of the same type!", "output_type": "error", "traceback": [ "\u001b[0;31mValueError\u001b[0m\u001b[0;31m:\u001b[0m The objects to compare must be of the same type!\n" ] } ], "source": [ "print(env.object_similarity(ind1, vuln1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some object types do not have a defined method for calculating semantic similarity and by default will give a warning and a result of zero." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "'report' type has no 'weights' dict specified & thus no object similarity method to call!\n" ] }, { "data": { "text/html": [ "
0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2 import Report\n", "\n", "r1 = Report(\n", " report_types=[\"campaign\"],\n", " name=\"Bad Cybercrime\",\n", " published=\"2016-04-06T20:03:00.000Z\",\n", " object_refs=[\"indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7\"],\n", ")\n", "r2 = Report(\n", " report_types=[\"campaign\"],\n", " name=\"Bad Cybercrime\",\n", " published=\"2016-04-06T20:03:00.000Z\",\n", " object_refs=[\"indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7\"],\n", ")\n", "print(env.object_similarity(r1, r2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, comparing objects of different spec versions will result in a `ValueError`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "The objects to compare must be of the same spec version!", "output_type": "error", "traceback": [ "\u001b[0;31mValueError\u001b[0m\u001b[0;31m:\u001b[0m The objects to compare must be of the same spec version!\n" ] } ], "source": [ "from stix2.v20 import Identity as Identity20\n", "\n", "id20 = Identity20(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", ")\n", "print(env.object_similarity(id2, id20))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can optionally allow comparing across spec versions by providing a configuration dictionary using `ignore_spec_version` like in the next example:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v20 import Identity as Identity20\n", "\n", "id20 = Identity20(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", ")\n", "print(env.object_similarity(id2, id20, **{\"_internal\": {\"ignore_spec_version\": True}}))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Detailed Results\n", "\n", "If your logging level is set to `DEBUG` or higher, the function will log more detailed results. These show the semantic similarity and weighting for each property that is checked, to show how the final result was arrived at." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Starting object similarity process between: 'threat-actor--54040762-8540-4c37-8f6d-6ebcc20da2b5' and 'threat-actor--b2a6f234-5594-42d9-9cdb-f4b82bc575a6'\n", "--\t\tpartial_string_based 'Evil Org' 'James Bond'\tresult: '11.111111111111114'\n", "'name' check -- weight: 60, contributing score: 6.666666666666669\n", "--\t\tpartial_list_based '['crime-syndicate']' '['spy']'\tresult: '0.0'\n", "'threat_actor_types' check -- weight: 20, contributing score: 0.0\n", "--\t\tpartial_list_based '['super-evil']' '['007']'\tresult: '0.0'\n", "'aliases' check -- weight: 20, contributing score: 0.0\n", "Matching Score: 6.666666666666669, Sum of Weights: 100.0\n" ] }, { "data": { "text/html": [ "
6.66666666666667\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import logging\n", "logging.basicConfig(format='%(message)s')\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.DEBUG)\n", "\n", "ta3 = ThreatActor(\n", " threat_actor_types=[\"crime-syndicate\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", ")\n", "ta4 = ThreatActor(\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", ")\n", "print(env.object_similarity(ta3, ta4))\n", "\n", "logger.setLevel(logging.ERROR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also retrieve the detailed results in a dictionary so the detailed results information can be accessed and used more programatically. The [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) function takes an optional third argument, called `prop_scores`. This argument should be a dictionary into which the detailed debugging information will be stored.\n", "\n", "Using `prop_scores` is simple: simply pass in a dictionary to [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity), and after the function has finished executing, the dictionary will contain the various scores. Specifically, it will have the overall `matching_score` and `sum_weights`, along with the weight and contributing score for each of the semantic similarity contributing properties.\n", "\n", "For example:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Semantic equivalence score using standard weights: 16.666666666666668\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
{'name': {'weight': 60, 'contributing_score': 6.666666666666669}, 'threat_actor_types': {'weight': 20, 'contributing_score': 10.0}, 'aliases': {'weight': 20, 'contributing_score': 0.0}, 'matching_score': 16.666666666666668, 'sum_weights': 100.0}\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Prop: name | weight: 60 | contributing_score: 6.666666666666669\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Prop: threat_actor_types | weight: 20 | contributing_score: 10.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Prop: aliases | weight: 20 | contributing_score: 0.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
matching_score: 16.666666666666668\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
sum_weights: 100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ta5 = ThreatActor(\n", " threat_actor_types=[\"crime-syndicate\", \"spy\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", ")\n", "ta6 = ThreatActor(\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", ")\n", "\n", "prop_scores = {}\n", "print(\"Semantic equivalence score using standard weights: %s\" % (env.object_similarity(ta5, ta6, prop_scores)))\n", "print(prop_scores)\n", "for prop in prop_scores:\n", " if prop not in [\"matching_score\", \"sum_weights\"]:\n", " print (\"Prop: %s | weight: %s | contributing_score: %s\" % (prop, prop_scores[prop]['weight'], prop_scores[prop]['contributing_score']))\n", " else:\n", " print (\"%s: %s\" % (prop, prop_scores[prop]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Custom Comparisons\n", "If you wish, you can customize semantic comparisons. Specifically, you can do any of three things:\n", " - Provide custom weights for each semantic equivalence contributing property\n", " - Provide custom comparison functions for individual semantic equivalence contributing properties\n", " - Provide a custom semantic equivalence function for a specific object type\n", "\n", "#### The `weights` dictionary\n", "In order to do any of the aforementioned (*optional*) custom comparisons, you will need to provide a `weights` dictionary as the last parameter to the [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) method call. \n", "\n", "The weights dictionary should contain both the weight and the comparison function for each property. You may use the default weights and functions, or provide your own.\n", "\n", "##### Existing comparison functions\n", "For reference, here is a list of the comparison functions already built in the codebase (found in [stix2/equivalence/object](../api/equivalence/stix2.equivalence.object.rst#module-stix2.equivalence.object)):\n", "\n", " - [custom_pattern_based](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.custom_pattern_based)\n", " - [exact_match](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.exact_match)\n", " - [list_reference_check](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.list_reference_check)\n", " - [partial_external_reference_based](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.partial_external_reference_based)\n", " - [partial_list_based](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.partial_list_based)\n", " - [partial_location_distance](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.partial_location_distance)\n", " - [partial_string_based](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.partial_string_based)\n", " - [partial_timestamp_based](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.partial_timestamp_based)\n", " - [reference_check](../api/equivalence/stix2.equivalence.object.rst#stix2.equivalence.object.reference_check)\n", "\n", "For instance, if we wanted to compare two of the `ThreatActor`s from before, but use our own weights, then we could do the following:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Using standard weights: 16.666666666666668\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Using custom weights: 28.33333333333334\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights = {\n", " \"threat-actor\": { # You must specify the object type\n", " \"name\": (30, stix2.equivalence.object.partial_string_based), # Each property's value must be a tuple\n", " \"threat_actor_types\": (50, stix2.equivalence.object.partial_list_based), # The 1st component must be the weight\n", " \"aliases\": (20, stix2.equivalence.object.partial_list_based) # The 2nd component must be the comparison function\n", " }\n", "}\n", "\n", "print(\"Using standard weights: %s\" % (env.object_similarity(ta5, ta6)))\n", "print(\"Using custom weights: %s\" % (env.object_similarity(ta5, ta6, **weights)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how there is a difference in the semantic similarity scores, simply due to the fact that custom weights were used.\n", "\n", "#### Custom Weights With prop_scores\n", "If we want to use both `prop_scores` and `weights`, then they would be the third and fourth arguments, respectively, to [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity):" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10.000000000000002" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
{'name': {'weight': 45, 'contributing_score': 5.000000000000002}, 'threat_actor_types': {'weight': 10, 'contributing_score': 5.0}, 'aliases': {'weight': 45, 'contributing_score': 0.0}, 'matching_score': 10.000000000000002, 'sum_weights': 100.0}\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prop_scores = {}\n", "weights = {\n", " \"threat-actor\": {\n", " \"name\": (45, stix2.equivalence.object.partial_string_based),\n", " \"threat_actor_types\": (10, stix2.equivalence.object.partial_list_based),\n", " \"aliases\": (45, stix2.equivalence.object.partial_list_based),\n", " },\n", "}\n", "env.object_similarity(ta5, ta6, prop_scores, **weights)\n", "print(prop_scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Custom Semantic Similarity Functions\n", "You can also write and use your own semantic equivalence functions. In the examples above, you could replace the built-in comparison functions for any or all properties. For example, here we use a custom string comparison function just for the `'name'` property:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Using custom string comparison: 5.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def my_string_compare(p1, p2):\n", " if p1 == p2:\n", " return 1\n", " else:\n", " return 0\n", " \n", "weights = {\n", " \"threat-actor\": {\n", " \"name\": (45, my_string_compare),\n", " \"threat_actor_types\": (10, stix2.equivalence.object.partial_list_based),\n", " \"aliases\": (45, stix2.equivalence.object.partial_list_based),\n", " },\n", "}\n", "print(\"Using custom string comparison: %s\" % (env.object_similarity(ta5, ta6, **weights)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also customize the comparison of an entire object type instead of just how each property is compared. To do this, provide a `weights` dictionary to [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) and in this dictionary include a key of `\"method\"` whose value is your custom semantic similarity function for that object type.\n", "\n", "If you provide your own custom semantic similarity method, you **must also provide the weights for each of the properties** (unless, for some reason, your custom method is weights-agnostic). However, since you are writing the custom method, your weights need not necessarily follow the tuple format specified in the above code box.\n", "\n", "Note also that if you want detailed results with `prop_scores` you will need to implement that in your custom function, but you are not required to do so.\n", "\n", "In this next example we use our own custom semantic similarity function to compare two `ThreatActor`s, and do not support `prop_scores`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Using standard weights: 16.666666666666668\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Using a custom method: 6.66666666666667\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def custom_semantic_similarity_method(obj1, obj2, **weights):\n", " sum_weights = 0\n", " matching_score = 0\n", " # Compare name\n", " w = weights['name']\n", " sum_weights += w\n", " contributing_score = w * stix2.equivalence.object.partial_string_based(obj1['name'], obj2['name'])\n", " matching_score += contributing_score\n", " # Compare aliases only for spies\n", " if 'spy' in obj1['threat_actor_types'] + obj2['threat_actor_types']:\n", " w = weights['aliases']\n", " sum_weights += w\n", " contributing_score = w * stix2.equivalence.object.partial_list_based(obj1['aliases'], obj2['aliases'])\n", " matching_score += contributing_score\n", " \n", " return matching_score, sum_weights\n", "\n", "weights = {\n", " \"threat-actor\": {\n", " \"name\": 60,\n", " \"aliases\": 40,\n", " \"method\": custom_semantic_similarity_method\n", " }\n", "}\n", "\n", "print(\"Using standard weights: %s\" % (env.object_similarity(ta5, ta6)))\n", "print(\"Using a custom method: %s\" % (env.object_similarity(ta5, ta6, **weights)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also write custom functions for comparing objects of your own custom types. Like in the previous example, you can use the built-in functions listed above to help with this, or write your own. In the following example we define semantic similarity for our new `x-foobar` object type. Notice that this time we have included support for detailed results with `prop_scores`." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
71.42857142857143\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
{'name': (60, 60.0), 'color': (40, 11.428571428571427), 'matching_score': 71.42857142857143, 'sum_weights': 100.0}\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def _x_foobar_checks(obj1, obj2, prop_scores, **weights):\n", " matching_score = 0.0\n", " sum_weights = 0.0\n", " if stix2.equivalence.object.check_property_present(\"name\", obj1, obj2):\n", " w = weights[\"name\"]\n", " sum_weights += w\n", " contributing_score = w * stix2.equivalence.object.partial_string_based(obj1[\"name\"], obj2[\"name\"])\n", " matching_score += contributing_score\n", " prop_scores[\"name\"] = (w, contributing_score)\n", " if stix2.equivalence.object.check_property_present(\"color\", obj1, obj2):\n", " w = weights[\"color\"]\n", " sum_weights += w\n", " contributing_score = w * stix2.equivalence.object.partial_string_based(obj1[\"color\"], obj2[\"color\"])\n", " matching_score += contributing_score\n", " prop_scores[\"color\"] = (w, contributing_score)\n", " \n", " prop_scores[\"matching_score\"] = matching_score\n", " prop_scores[\"sum_weights\"] = sum_weights\n", " return matching_score, sum_weights\n", "\n", "prop_scores = {}\n", "weights = {\n", " \"x-foobar\": {\n", " \"name\": 60,\n", " \"color\": 40,\n", " \"method\": _x_foobar_checks,\n", " },\n", " \"_internal\": {\n", " \"ignore_spec_version\": False,\n", " },\n", "}\n", "foo1 = {\n", " \"type\":\"x-foobar\",\n", " \"id\":\"x-foobar--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061\",\n", " \"name\": \"Zot\",\n", " \"color\": \"red\",\n", "}\n", "foo2 = {\n", " \"type\":\"x-foobar\",\n", " \"id\":\"x-foobar--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061\",\n", " \"name\": \"Zot\",\n", " \"color\": \"blue\",\n", "}\n", "print(env.object_similarity(foo1, foo2, prop_scores, **weights))\n", "print(prop_scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Graph Similarity and Equivalence\n", "\n", "The next logical step for checking if two individual objects are similar or equivalent is to check all relevant neighbors and related objects for the best matches. It can help you determine if you have seen similar intelligence in the past and builds upon the foundation of the local object similarity comparisons described above. The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has two functions with similar requirements for graph-based checks.\n", "\n", "For each supported object type, the [graph_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.graph_similarity) function checks if the values for a specific set of objects match and will compare against all of the same type objects, maximizing for score obtained from the properties match. It requires two DataStore instances which represent the two graphs to be compared and allows the algorithm to make additional checks like de-referencing objects. Internally it calls [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity). \n", "\n", "Some limitations exist that are important to understand when analyzing the results of this algorithm.\n", "- Only STIX types with weights defined will be checked. This could result in a maximal sub-graph and score that is smaller than expect. We recommend looking at the prop_scores or logging output for details and to understand how the result was calculated.\n", "- Failure to de-reference an object for checks will result in a 0 for that property. This applies to `*_ref` or `*_refs` properties.\n", "- Keep reasonable expectations in terms of how long it takes to run, especially with DataStores that require network communication or when the number of items in the graphs is high. You can also tune how much depth the algorithm should check in de-reference calls; this can affect your running-time.\n", "\n", "**Please note** that you will need to install the TAXII dependencies in addition to the semantic requirements if you plan on using the TAXII DataStore classes. You can do this using:\n", "\n", "```pip install stix2[taxii]```\n", "\n", "#### Graph Similarity and Equivalence Example\n", "\n", "By default, the algorithm uses default weights defined here [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) in combination with [graph_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.graph_similarity)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
59.68831168831168\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
False\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
{\n",
       "    "matching_score": 835.6363636363635,\n",
       "    "len_pairs": 14,\n",
       "    "summary": {\n",
       "        "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f": {\n",
       "            "lhs": "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",\n",
       "            "rhs": "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 6.666666666666669\n",
       "                },\n",
       "                "threat_actor_types": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 0.0\n",
       "                },\n",
       "                "aliases": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 0.0\n",
       "                },\n",
       "                "matching_score": 6.666666666666669,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 6.66666666666667\n",
       "        },\n",
       "        "campaign--02eb6d99-15d3-4534-99ce-d5f946ca52fe": {\n",
       "            "lhs": "campaign--02eb6d99-15d3-4534-99ce-d5f946ca52fe",\n",
       "            "rhs": "campaign--d7fecca0-d020-43ae-977d-8d226df84c36",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 18.0\n",
       "                },\n",
       "                "matching_score": 18.0,\n",
       "                "sum_weights": 60.0\n",
       "            },\n",
       "            "value": 30.0\n",
       "        },\n",
       "        "campaign--d7fecca0-d020-43ae-977d-8d226df84c36": {\n",
       "            "lhs": "campaign--d7fecca0-d020-43ae-977d-8d226df84c36",\n",
       "            "rhs": "campaign--02eb6d99-15d3-4534-99ce-d5f946ca52fe",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 18.0\n",
       "                },\n",
       "                "matching_score": 18.0,\n",
       "                "sum_weights": 60.0\n",
       "            },\n",
       "            "value": 30.0\n",
       "        },\n",
       "        "indicator--d17a1296-d6c9-4119-9fbf-433c7f1f11af": {\n",
       "            "lhs": "indicator--d17a1296-d6c9-4119-9fbf-433c7f1f11af",\n",
       "            "rhs": "indicator--d2e7d0b6-4229-447d-9c44-2b0f7d93797b",\n",
       "            "prop_score": {\n",
       "                "indicator_types": {\n",
       "                    "weight": 15,\n",
       "                    "contributing_score": 15.0\n",
       "                },\n",
       "                "pattern": {\n",
       "                    "weight": 80,\n",
       "                    "contributing_score": 0\n",
       "                },\n",
       "                "valid_from": {\n",
       "                    "weight": 5,\n",
       "                    "contributing_score": 5.0\n",
       "                },\n",
       "                "matching_score": 20.0,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 20.0\n",
       "        },\n",
       "        "indicator--d2e7d0b6-4229-447d-9c44-2b0f7d93797b": {\n",
       "            "lhs": "indicator--d2e7d0b6-4229-447d-9c44-2b0f7d93797b",\n",
       "            "rhs": "indicator--d17a1296-d6c9-4119-9fbf-433c7f1f11af",\n",
       "            "prop_score": {\n",
       "                "indicator_types": {\n",
       "                    "weight": 15,\n",
       "                    "contributing_score": 15.0\n",
       "                },\n",
       "                "pattern": {\n",
       "                    "weight": 80,\n",
       "                    "contributing_score": 0\n",
       "                },\n",
       "                "valid_from": {\n",
       "                    "weight": 5,\n",
       "                    "contributing_score": 5.0\n",
       "                },\n",
       "                "matching_score": 20.0,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 20.0\n",
       "        },\n",
       "        "relationship--b399060e-0cdb-4e41-a30e-5894ae3627e8": {\n",
       "            "lhs": "relationship--b399060e-0cdb-4e41-a30e-5894ae3627e8",\n",
       "            "rhs": "relationship--b97e59e9-5e0d-47ef-a3f9-6a6e4fcefaab",\n",
       "            "prop_score": {\n",
       "                "relationship_type": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 20.0\n",
       "                },\n",
       "                "source_ref": {\n",
       "                    "weight": 40,\n",
       "                    "contributing_score": 2.666666666666668\n",
       "                },\n",
       "                "target_ref": {\n",
       "                    "weight": 40,\n",
       "                    "contributing_score": 36.0\n",
       "                },\n",
       "                "matching_score": 58.66666666666667,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 58.666666666666664\n",
       "        },\n",
       "        "relationship--b97e59e9-5e0d-47ef-a3f9-6a6e4fcefaab": {\n",
       "            "lhs": "relationship--b97e59e9-5e0d-47ef-a3f9-6a6e4fcefaab",\n",
       "            "rhs": "relationship--b399060e-0cdb-4e41-a30e-5894ae3627e8",\n",
       "            "prop_score": {\n",
       "                "relationship_type": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 20.0\n",
       "                },\n",
       "                "source_ref": {\n",
       "                    "weight": 40,\n",
       "                    "contributing_score": 2.666666666666668\n",
       "                },\n",
       "                "target_ref": {\n",
       "                    "weight": 40,\n",
       "                    "contributing_score": 36.0\n",
       "                },\n",
       "                "matching_score": 58.66666666666667,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 58.666666666666664\n",
       "        },\n",
       "        "report--87a26bd6-2870-44de-980f-e4cc6b63e1d5": {\n",
       "            "lhs": "report--87a26bd6-2870-44de-980f-e4cc6b63e1d5",\n",
       "            "rhs": "report--a71101c7-6064-4b8f-a9b4-ff49ff65e524",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 30,\n",
       "                    "contributing_score": 30.0\n",
       "                },\n",
       "                "published": {\n",
       "                    "weight": 10,\n",
       "                    "contributing_score": 10.0\n",
       "                },\n",
       "                "object_refs": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 29.0\n",
       "                },\n",
       "                "matching_score": 69.0,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 69.0\n",
       "        },\n",
       "        "report--a71101c7-6064-4b8f-a9b4-ff49ff65e524": {\n",
       "            "lhs": "report--a71101c7-6064-4b8f-a9b4-ff49ff65e524",\n",
       "            "rhs": "report--87a26bd6-2870-44de-980f-e4cc6b63e1d5",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 30,\n",
       "                    "contributing_score": 30.0\n",
       "                },\n",
       "                "published": {\n",
       "                    "weight": 10,\n",
       "                    "contributing_score": 10.0\n",
       "                },\n",
       "                "object_refs": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 29.0\n",
       "                },\n",
       "                "matching_score": 69.0,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 69.0\n",
       "        },\n",
       "        "identity--2b40ba3f-aa22-4e11-bd9d-e4843927ad32": {\n",
       "            "lhs": "identity--2b40ba3f-aa22-4e11-bd9d-e4843927ad32",\n",
       "            "rhs": "identity--4d8b54e3-d584-47c6-858f-673fffa45e96",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 60.0\n",
       "                },\n",
       "                "identity_class": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 20.0\n",
       "                },\n",
       "                "matching_score": 80.0,\n",
       "                "sum_weights": 80.0\n",
       "            },\n",
       "            "value": 100.0\n",
       "        },\n",
       "        "identity--4d8b54e3-d584-47c6-858f-673fffa45e96": {\n",
       "            "lhs": "identity--4d8b54e3-d584-47c6-858f-673fffa45e96",\n",
       "            "rhs": "identity--2b40ba3f-aa22-4e11-bd9d-e4843927ad32",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 60,\n",
       "                    "contributing_score": 60.0\n",
       "                },\n",
       "                "identity_class": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 20.0\n",
       "                },\n",
       "                "matching_score": 80.0,\n",
       "                "sum_weights": 80.0\n",
       "            },\n",
       "            "value": 100.0\n",
       "        },\n",
       "        "attack-pattern--57bc38b5-feda-4710-b613-441717c0062c": {\n",
       "            "lhs": "attack-pattern--57bc38b5-feda-4710-b613-441717c0062c",\n",
       "            "rhs": "attack-pattern--d9de40c6-a9a0-4e6f-ae59-d90a91e4f0e8",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 30,\n",
       "                    "contributing_score": 21.818181818181817\n",
       "                },\n",
       "                "external_references": {\n",
       "                    "weight": 70,\n",
       "                    "contributing_score": 70.0\n",
       "                },\n",
       "                "matching_score": 91.81818181818181,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 91.81818181818181\n",
       "        },\n",
       "        "attack-pattern--d9de40c6-a9a0-4e6f-ae59-d90a91e4f0e8": {\n",
       "            "lhs": "attack-pattern--d9de40c6-a9a0-4e6f-ae59-d90a91e4f0e8",\n",
       "            "rhs": "attack-pattern--57bc38b5-feda-4710-b613-441717c0062c",\n",
       "            "prop_score": {\n",
       "                "name": {\n",
       "                    "weight": 30,\n",
       "                    "contributing_score": 21.818181818181817\n",
       "                },\n",
       "                "external_references": {\n",
       "                    "weight": 70,\n",
       "                    "contributing_score": 70.0\n",
       "                },\n",
       "                "matching_score": 91.81818181818181,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 91.81818181818181\n",
       "        },\n",
       "        "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e": {\n",
       "            "lhs": "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e",\n",
       "            "rhs": "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e",\n",
       "            "prop_score": {\n",
       "                "malware_types": {\n",
       "                    "weight": 20,\n",
       "                    "contributing_score": 10.0\n",
       "                },\n",
       "                "name": {\n",
       "                    "weight": 80,\n",
       "                    "contributing_score": 80.0\n",
       "                },\n",
       "                "matching_score": 90.0,\n",
       "                "sum_weights": 100.0\n",
       "            },\n",
       "            "value": 90.0\n",
       "        }\n",
       "    }\n",
       "}\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "\n", "from stix2 import Relationship\n", "\n", "\n", "g1 = [\n", " AttackPattern(\n", " name=\"Phishing\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example2\",\n", " \"source_name\": \"some-source2\",\n", " },\n", " ],\n", " ),\n", " Campaign(name=\"Someone Attacks Somebody\"),\n", " Identity(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", " description=\"Just some guy\",\n", " ),\n", " Indicator(\n", " indicator_types=['malicious-activity'],\n", " pattern_type=\"stix\",\n", " pattern=\"[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']\",\n", " valid_from=\"2017-01-01T12:34:56Z\",\n", " ),\n", " Malware(id=MALWARE_ID,\n", " malware_types=['ransomware'],\n", " name=\"Cryptolocker\",\n", " is_family=False,\n", " ),\n", " ThreatActor(id=THREAT_ACTOR_ID,\n", " threat_actor_types=[\"crime-syndicate\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", " ),\n", " Relationship(\n", " source_ref=THREAT_ACTOR_ID,\n", " target_ref=MALWARE_ID,\n", " relationship_type=\"uses\",\n", " ),\n", " Report(\n", " report_types=[\"campaign\"],\n", " name=\"Bad Cybercrime\",\n", " published=\"2016-04-06T20:03:00.000Z\",\n", " object_refs=[THREAT_ACTOR_ID, MALWARE_ID],\n", " ),\n", "]\n", "\n", "g2 = [\n", " AttackPattern(\n", " name=\"Spear phishing\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example2\",\n", " \"source_name\": \"some-source2\",\n", " },\n", " ],\n", " ),\n", " Campaign(name=\"Another Campaign\"),\n", " Identity(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", " description=\"A person\",\n", " ),\n", " Indicator(\n", " indicator_types=['malicious-activity'],\n", " pattern_type=\"stix\",\n", " pattern=\"[file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4']\",\n", " valid_from=\"2017-01-01T12:34:56Z\",\n", " ),\n", " Malware(id=MALWARE_ID,\n", " malware_types=['ransomware', 'dropper'],\n", " name=\"Cryptolocker\",\n", " is_family=False,\n", " ),\n", " ThreatActor(id=THREAT_ACTOR_ID,\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", " ),\n", " Relationship(\n", " source_ref=THREAT_ACTOR_ID,\n", " target_ref=MALWARE_ID,\n", " relationship_type=\"uses\",\n", " ),\n", " Report(\n", " report_types=[\"campaign\"],\n", " name=\"Bad Cybercrime\",\n", " published=\"2016-04-06T20:03:00.000Z\",\n", " object_refs=[THREAT_ACTOR_ID, MALWARE_ID],\n", " ),\n", "]\n", "\n", "memstore1 = MemoryStore(g1)\n", "memstore2 = MemoryStore(g2)\n", "prop_scores = {}\n", "\n", "similarity_result = env.graph_similarity(memstore1, memstore2, prop_scores)\n", "equivalence_result = env.graph_equivalence(memstore1, memstore2, threshold=60)\n", "\n", "print(similarity_result)\n", "print(equivalence_result)\n", "print(json.dumps(prop_scores, indent=4, sort_keys=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The example above uses the same objects found in previous examples to demonstrate the graph similarity and equivalence use. Under this approach, Grouping, Relationship, Report, and Sighting have default weights defined, allowing object de-referencing. The Report and Relationship objects respectively show their `*_ref` and `*_refs` properties checked in the summary output. Analyzing the similarity output we can observe that objects scored high when checked individually, but when the rest of the graph is taken into account, discrepancies add up and produce a lower score." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0a6" } }, "nbformat": 4, "nbformat_minor": 2 }