Excuse me, do you speak fraud?

Excuse me, do you speak fraud?

Network graph analysis for fraud detection and mitigation

by Scott Mongeau

Executive summary 5

Network analysis offers a new set of techniques to tackle the persistent and growing problem of complex fraud. Network analysis supplements traditional techniques by providing a mechanism to bridge investigative and analytics methods. Beyond base visualization, network analysis provides a standardized platform for complex fraud pattern storage and retrieval, pattern discovery and detection, statistical analysis, and risk scoring. This article gives an overview of the main challenges and demonstrates a promising approach using a hands-on example.

Understanding the problem of fraud detection

With swelling globalization, advanced digital communication technology, and international financial deregulation, fraud investigators face a daunting battle against increasingly sophisticated fraudsters. Fraud is estimated to encompass 5% of the global economy, resulting in an annual loss of more than €2.3 trillion. Further, indications are that fraud is growing in volume, scope, and sophistication.


In an increasingly global and virtual world, the methods for perpetrating fraud are growing in sophistication. As well, fraudsters are increasingly able to collaborate in international rings to perpetrate their schemes and to distribute ill-acquired gains.


A complication in the effort to detect and mitigate complex cases of fraud is the difficulty of smoothly bridging the worlds of forensic investigation and data analytics. Fraud investigators, with deep domain knowledge and street smarts, plough through complex documents, interview parties of interest, and spend time understanding the arcane schemes by which fraudsters attempt to avoid detection.

However, the growing scale of complex fraud means that investigators are increasingly being overwhelmed by volume. Additionally, fraud specialists have deep knowledge of complex fraud cases – tacit knowledge – yet it is difficult to make this knowledge explicit such that it is suitable for efficient sharing. This is especially the case in terms of difficulties in describing complex fraud in terms of patterns suitable for systems-driven detection and analysis.

Meanwhile, data analytics experts gather, transform, and analyse datasets for possible fraud, addressing the challenges of scale and volume via ‘big data’ approaches. Statistical techniques for detecting outliers and algorithmic techniques for identifying suspicious patterns are applied, machine learning for example.

However, fraud detection via advanced analytics typically depends on structured datasets and structured data models. As a result, it is rare that exhaustive datasets are available which encompass all the domains surrounding complex cases of fraud.

As well, machine learning and data mining methods are primarily ‘supervised’, meaning they require training datasets which contain known fraud cases. Sophisticated fraudsters often are knowledgeable concerning automated detection methods and take pains to evade such detection. As a result, complex and ‘innovative’ types of fraud potentially circumvent automated detection when the methods avoid upsetting standard processes (i.e. they leave a seemingly ‘normal’ data trail).

Network analytics for fraud detection

Network analytics is a powerful tool to amplify traditional fraud investigative approaches – a method for cataloguing known, detecting hidden, and discovering new types of fraud. What is principally lacking in the disconnection between the forensics world and the world of data analytics is a transparent, standard language for communicating and searching for complex fraud patterns.

The fraud investigative world deals in rich details and confronts constantly emerging and evolving techniques. The computational world typically communicates in highly structured, abstract datasets and applies analysis via structured datasets. Datasets are often limited in scope and relational database models are slow to accommodate rapidly evolving schemes. Somewhere along the line, the rich complexities of fraud schemes evade both hands-on and automated detection.

This is where network graph analysis is of central value– it offers a method for capturing the rich context of fraud in a standard, machine readable and transferable format. Once captured in such a format, deep pattern and statistical analysis can be conducted on existing datasets. Network analytics is thus a complementary approach which enhances and bridges fraud investigatory and data analytics approaches.

The schemes to dodge or exploit taxes are manifold and range from simple to labyrinthine. In particular enterprise and institutions suffer when complex schemes are systematized at a high-volume or involve transactions in high amounts. Sophisticated fraudsters operating at this scale often operate in rings and across borders.

Case in point – EU VAT fraud

As an example, particular markets in the EU are susceptible to cross-border fraud schemes whereby participants seek to avoid value-added-tax (VAT) charges and exploit national tax credits. The amounts are substantial, with some EU countries foregoing or improperly crediting VAT charges of as high as 25%. Avoidance and claiming improper credits together systemically cut into national tax revenues across the EU.

Fraudsters are savvy in targeting particular markets and borders, often operating via complex sets of cross-border holding companies and ownership structures. Emerging, unregulated, and highly dynamic markets are particularly at risk, such as those associated with emerging or high-volume specialized commodities. As well, markets which deal in tradable rights or other intangibles are at risk, as they do not leave a physical trail (i.e. lack of witnesses, shipping records, and storage manifests).

Via native network data analysis, such complex fraud schemes can be described in both their general and specific manifestations. As an example, a recognized VAT fraud involves trading international telecommunication rights (the exchange of rights to telecommunication service). The pattern of a particular scheme was translated into a network format and stored in a ‘graph database’ (a native database for storing, managing, and retrieving networked data):


Figure 1: Cross-border EU value-added-tax fraud scheme involving a missing trader and tax credit abuse as encoded in a standard network format with countries denoted (names fictionalized)

The scheme can be summarized succinctly as thus:

  1. Southern Europa Telco (3-) buys U.S. phone card rights from two U.S. companies (1- and 2-),
  2. Southern Europa Telco (3-) re-sells within Italy to joint Bridge Co. (4-) and collects VAT,
  3. Southern Europa Telco (3-) does not pay VAT to Italian tax authorities, instead disappearing with the VAT and becoming a missing trader,
  4. Joint Bridge Co. (4-) resells to Swift Co. (6-) within Italy via parent company Joint IT Group (A-),
  5. Swift Co. (6-) pays VAT to Joint IT Group (A-),
  6. Swift Co. (6-) sells across border to UK Chips Trading Ltd. (7-) and U.S. Nexus Global US Ltd. (11-),
  7. Swift Co. claims VAT credit from Italian tax authorities to offset other international business activities,
  8. Chips Trading Ltd. firm sells to Strand VI Co. (9-) in Virgin Islands via sister firm Chips Global (8-) within Chips UK Group (B-) – this allows Chips UK Group (B-) to claim VAT neutrality,
  9. Strand VI Co. in the Virgin Islands becomes the final recipient of the phone card rights, which can then be recycled to the U.S. Presumably a back-door mechanism exists within the Virgin Islands for participants to share in the benefits: VAT appropriated by missing trader and Italian VAT tax credits.

Recognized schemes, often the result of an intensive fraud investigation, can thus be encoded using a standard format. The pattern can then be used to detect similar transactions in large datasets. However, the Italian national tax authority, absent full details from foreign tax authorities, likely only has insight into a reduced transactional view of this scheme. Namely, only initial transactions across the border and within Italy are likely visible:


   Figure 2:  Cross-border EU VAT tax fraud scheme from the perspective of Italian tax authorities

In this manifestation, it becomes difficult for the Italian tax authorities to apply traditional automated data analytics detection methods (i.e. data mining or machine learning). However, by having documented the full VAT fraud scheme in a network format, characteristic details of the fraud can also be documented. In particular, several unusual aspects of the Italian companies were resident (and can be stored) in the full fraud pattern documented previously:

  1. Transience of the missing trader: the chief earmark of this fraud pattern involves indications that the missing trader is a ‘front’ – a company set-up quickly with the intention of disappearing quickly. Data from the Chamber of Commerce and tax office concerning the inception date of the company may indicate that it is close to the initial purchase transaction, triggering an alert. As well, upon a warning, forensics investigators can examine additional details to substantiate the company as being ‘at risk’ – for instance, a false on non-answering phone number, an unoccupied address, and/or a ‘fake’ website.
  2. Velocity: for the fraud to operate at a low risk of detection, the entire transaction is likely completed in a relatively compressed period of time (ideally before the missing trader is detected by the tax office) – the short time-span (based on date signatures on the transactions in the data) can be calculated and detected,
  3. Position of the missing trader: the missing trader is the initial purchaser at the border – the entire rapid transaction chain (as per b) exiting the country in three steps could be used to trigger an alert to immediately check the validity of the initial purchaser, as per a.
  4. Volume and/or scale: for the fraud to be commercially viable, it needs to be conducted either at great volume or scale – indications of multiple transaction chains along the same path in a short time period and/or large transactions are potential alerts to check a.
  5. Additional data: company ownership by citizens (national citizen number) can be layered onto network data – citizens with ownership stakes in two or more companies in the transaction chain would be considered suspicious, for instance, and
  6. Third-party data: data from the police, banks, and credit agencies can be layered onto the network data to identify individuals and companies with a high-risk for fraud and resulting scores can be used in aggregate to rate a transaction chain as high risk!

Working with the Neo4J graph database, we can encode such a fraud scheme pattern via a Cypher statement. This pattern represents an approximation of the limited set of transactions visible to the Italian authorities:

CREATE (CO1:Company { name: '1-Red Phonecard Co.', country: 'USA', type: 'LLC', creation_date: '20/08/2013', epoch:1377976450})
CREATE (CO2:Company { name: '2-Black Phonecard Co.', country: 'USA', type: 'LLC', creation_date: '21/09/2013', epoch:1376210064})
CREATE (CO3:Company { name: '3-Southern Europa Telco', country: 'Italy', type: 'SRL', creation_date: '24/09/2013', epoch:1377516603})
CREATE (CO4:Company { name: '4-Joint Bridge Co.', country: 'Italy', type: 'SRL', creation_date: '12/08/2013', epoch:1376927344})
CREATE (CO5:Company { name: '5-Joint Telco Co.', country: 'Italy', type: 'SRL', creation_date: '12/09/2013', epoch:1376272617})
CREATE (CO6:Company { name: '6-Swift Co.', country: 'Italy', type: 'SpA', creation_date: '13/08/2013', epoch:1377717413})
CREATE (CO7:Company { name: '7-Chips Trading Ltd.', country: 'UK', type: 'LTD', creation_date: '22/09/2013', epoch:1376163978})
CREATE (CO8:Company { name: '8-Chips Global', country: 'UK', type: 'LLC', creation_date: '20/09/2013', epoch:1376524839})
CREATE (CO9:Company { name: '9-Strand VI Co.', country: 'UK', type: 'LLC', creation_date: '13/09/2013', epoch:1375492877})
CREATE (CO10:Company { name: '10-Nexus Trading UK Ltd.', country: 'UK', type: 'LTD', creation_date: '26/08/2013', epoch:1376265272})
CREATE (CO11:Company { name: '11-Nexus Global US Ltd.', country: 'USA', type: 'LTD', creation_date: '27/09/2013', epoch:1375770509})
CREATE (CO12:HoldingCo { name: 'A-Joint IT Group', country: 'Italy', type: 'Holding', creation_date: '22/07/2013', epoch:1374601712})
CREATE (CO13:HoldingCo { name: 'B-Chips UK Group', country: 'UK', type: 'Holding', creation_date: '11/07/2013', epoch:1374008015})
CREATE (CO14:HoldingCo { name: 'C-Nexus Intl Group', country: 'UK', type: 'Holding', creation_date: '17/07/2013', epoch:1373787200})
CREATE (P01:Person { name: 'Alberico Goffredo', country: 'Italy', age: '28', criminal_status: 'nothing'})
CREATE (P02:Person { name: 'Charlie Walt', country: 'USA', age: '44', criminal_status: 'nothing'})
CREATE (P03:Person { name: 'Cletis Bysshe', country: 'USA', age: '28', criminal_status: 'nothing'})
CREATE (P04:Person { name: 'Nicodemo Gionata', country: 'Italy', age: '53', criminal_status: 'known crook'})
CREATE (P05:Person { name: 'Carmelo Achille', country: 'Italy', age: '34', criminal_status: 'nothing'})
CREATE (P06:Person { name: 'Edoardo Primo', country: 'Italy', age: '58', criminal_status: 'nothing'})
CREATE (P07:Person { name: 'Cam Esmond', country: 'UK', age: '41', criminal_status: 'known crook'})
CREATE (P08:Person { name: 'Peyton Ewart', country: 'UK', age: '48', criminal_status: 'nothing'})
CREATE (P09:Person { name: 'Vivian Vann', country: 'UK', age: '56', criminal_status: 'nothing'})
CREATE (P10:Person { name: 'Madilyn Hailey', country: 'UK', age: '53', criminal_status: 'known crook'})
CREATE (P11:Person { name: 'Suzanna Salvage', country: 'UK', age: '30', criminal_status: 'nothing'})
CREATE (P12:Person { name: 'John Hudson', country: 'UK', age: '32', criminal_status: 'nothing'})
CREATE (CO1)-[:SELLS_TO{date: '41548', item_type: 'phone cards rights', epoch: 1380617873, amt: '10000000'}]->(CO3)
CREATE (CO2)-[:SELLS_TO{date: '41548', item_type: 'phone cards rights', epoch: 1380617873, amt: '15000000'}]->(CO3)
CREATE (CO3)-[:SELLS_TO{date: '41557', item_type: 'phone cards rights', epoch: 1381395473, amt: '25000000'}]->(CO4)
CREATE (CO12)-[:SELLS_TO{date: '41562', item_type: 'phone cards rights', epoch: 1381827473, amt: '25000000'}]->(CO6)
CREATE (CO6)-[:SELLS_TO{date: '41567', item_type: 'phone cards rights', epoch: 1382259473, amt: '25000000'}]->(CO7)
CREATE (CO6)-[:SELLS_TO{date: '41572', item_type: 'phone cards rights', epoch: 1382691473, amt: '25000000'}]->(CO11)
CREATE (CO8)-[:SELLS_TO{date: '41577', item_type: 'phone cards rights', epoch: 1383123473, amt: '25000000'}]->(CO9)
CREATE (CO3)-[:COLLECTS_VAT{date: '41557', item_type: 'VAT paid', epoch: 1381395473, amt: '10000000'}]->(CO4)
CREATE (CO12)-[:COLLECTS_VAT{date: '41562', item_type: 'VAT paid', epoch: 1381827473, amt: '10000000'}]->(CO6)
CREATE (CO12)-[:PARENT_OF{legal_status: 'parent company'}]->(CO4)
CREATE (CO12)-[:PARENT_OF{legal_status: 'parent company'}]->(CO5)
CREATE (CO13)-[:PARENT_OF{legal_status: 'parent company'}]->(CO7)
CREATE (CO13)-[:PARENT_OF{legal_status: 'parent company'}]->(CO8)
CREATE (CO14)-[:PARENT_OF{legal_status: 'parent company'}]->(CO10)
CREATE (CO14)-[:PARENT_OF{legal_status: 'parent company'}]->(CO11)
CREATE (P01)-[:DIRECTOR_OF{position: 'director'}]->(CO1)
CREATE (P02)-[:DIRECTOR_OF{position: 'director'}]->(CO2)
CREATE (P03)-[:DIRECTOR_OF{position: 'director'}]->(CO3)
CREATE (P04)-[:DIRECTOR_OF{position: 'director'}]->(CO4)
CREATE (P05)-[:DIRECTOR_OF{position: 'director'}]->(CO5)
CREATE (P06)-[:DIRECTOR_OF{position: 'director'}]->(CO6)
CREATE (P07)-[:DIRECTOR_OF{position: 'director'}]->(CO7)
CREATE (P08)-[:DIRECTOR_OF{position: 'director'}]->(CO8)
CREATE (P09)-[:DIRECTOR_OF{position: 'director'}]->(CO9)
CREATE (P10)-[:DIRECTOR_OF{position: 'director'}]->(CO10)
CREATE (P02)-[:DIRECTOR_OF{position: 'director'}]->(CO11)
CREATE (P04)-[:DIRECTOR_OF{position: 'director'}]->(CO12)
CREATE (P11)-[:DIRECTOR_OF{position: 'director'}]->(CO13)
CREATE (P12)-[:DIRECTOR_OF{position: 'director'}]->(CO14)

Given this pattern and knowing the tell-tale aspects of the fraud, query can be developed which will identify a similar pattern in a large set of transactional data. In this example, we would like to identify any sets of cross-border telecommunication rights trades occurring over a short period of time (i.e. less than 15 days) and whereby an intermediary company in the chain of transactions is quite new (i.e. less than 90 days old).

Working with Cypher, we can query a large Neo4J dataset for this specific pattern in tax transactions (thanks to Jean Villedieu of linkurio.us for support and Jim Biard for the query design):

MATCH p=(a:Company)-[rs:SELLS_TO*]->(c:Company)
WHERE a.country <> c.country
WITH p, a, c, rs, nodes(p) AS ns
WITH p, a, c, rs, filter(n IN ns WHERE n.epoch - 1383123473 < (90*60*60*24)) AS bs
WITH p, a, c, rs, head(bs) AS b
WITH p, a, b, c, head(rs) AS r1, last(rs) AS rn
WITH p, a, b, c, r1, rn, rn.epoch - r1.epoch AS d
WHERE d < (15*60*60*24)
RETURN a, b, c, d, r1, rn

To summarize, we added an example transaction to a graph database (in practice we assume there would be millions of other transaction chains in a production graph database). Knowing tell-tale indications of the particular VAT fraud scheme, we designed and ran a specific query to identify matching transactions in a large set of transactional data.

The full fraud pattern, stored in a graph database ‘fraud library’ in an annotated, network-descriptive format, gives tell-tale indications for detection in the smaller pattern-set available to the national tax authorities. This then supports detection in a large set of national tax data.

Beyond visualization: statistical measures

The value of storing fraud schemes as standard patterns in a network format (in a graph database) can be summarized as:

  • standardization without sacrificing detail,
  • ability to communicate patterns between systems transparently,
  • ability to amplify patterns with additional data, and
  • ability to run dynamic network queries on ‘big data’ sets.

However, an additional benefit exists – the ability to characterize statistical measures to empower the discovery of new patterns and automatic pattern detection.

Network science and graph analysis encompasses rich, existing fields of study which specify and study reoccurring patterns and quantitative aspects of networks. Likewise, the social sciences have adopted these principals to study social phenomenon via social network analysis (SNA).

Together, these domains observe that all network structures have common patterns, and that these patterns can be studied and quantified. Networks can be measured in terms of hard measures such as reach, clustering or modularity, centrality, and dispersion. Transactions entail steps across a network, and these steps can be scored in terms of ‘weight’, for instance in terms of volume, frequency, speed (over time), amount (monetary), or risk (i.e. in terms of credit risk). Additionally, individuals and companies can be assessed in terms of their relative positions and interactions in a network.

As an example, returning to the VAT fraud example, national tax offices have data concerning company cross-ownership and the association of citizens (via national identification numbers).   These details can be used to assess the association of known fraudsters or high-risk individuals with others. Thus, a seemingly ‘clean’ company or individual which transacts frequently or in a high amounts between two high-risk entities could be flagged in terms of participating in at-risk transactions. The results can then be used to enhance traditional machine learning detection methods.


Figure 3: Utilizing networked data to establish risk scores for transactions and company associations, which can be used to enhance machine learning approaches

Summary conclusion

The native storage of fraud patterns as network phenomenon, and the application of these patterns to fraud detection is a powerful technique. This approach allows for the composition of ‘fraud libraries’ to capture rich details concerning schemes. Once encoded, tell-tale features of the fraud can be identified to give investigators indications of where to focus automated detection efforts. Additionally, storing and analysing network data leads to new types of indicators via network analysis: statistical measures and the ability to ‘score’ transactions and associations for aggregate risk. At the cutting edge, data on networks can be examined and simulated over time to gain new insights into how markets and transactions are evolving in character – a foundation for strategy formation and proactive preparedness.

, , , , , , , ,

About SARK7

Scott Allen Mongeau (SARK7) is an INFORMS Certified Analytics Professional (CAP) and a Data Scientist in the Cybersecurity business unit at SAS Institute. Scott has over 20 years of experience in project-focused analytics functions in a range of industries, including IT, biotech, pharma, materials, insurance, law enforcement, financial services, and start-ups. Scott is a part-time PhD (ABD) researcher at Nyenrode Business University. He holds a Global Executive MBA (OneMBA) and Masters in Financial Management from Erasmus Rotterdam School of Management (RSM). He has a Certificate in Finance from University of California at Berkeley Extension, a MA in Communication from the University of Texas at Austin, and a Graduate Degree (GD) in Applied Information Systems Management from the Royal Melbourne Institute of Technology (RMIT). He holds a BPhil from Miami University of Ohio. Having lived and worked in a number of countries, Scott is a dual American (native) and Dutch citizen. He may be contacted at: webmaster@sark7.com All posts are copyright © 2015 SARK7 All external materials utilized imply no ownership rights and are presented purely for educational purposes.

View all posts by SARK7


Subscribe to our RSS feed and social profiles to receive updates.

7 Comments on “Excuse me, do you speak fraud?”

  1. sctr7 Says:

    Another good example – bank fraud detection via graph / network analysis: http://gist.neo4j.org/?github-neo4j-contrib%2Fgists%2F%2Fother%2FBankFraudDetection.adoc



  1. The Cutting Edge: Network Analytics for Financial Fraud Detection and Mitigation | BAM! Business Analytics Management… - August 19, 2014

    […] Want to see a more detailed example? See an implementation using the Neo4J graph database […]

  2. Predictive policing: the brave new age of law enforcement analytics | BAM! Business Analytics Management… - September 5, 2014

    […] sctr7 Excuse me, do you speak fraud? […]

  3. Fraud analytics: collected links | BAM! Business Analytics Management… - October 1, 2014

    […] post Excuse me, do you speak […]

  4. Anti money laundering (AML): the network graph analytics approach | BAM! Business Analytics Management… - October 10, 2014

    […] post Excuse me, do you speak […]

  5. Here Be Dragons: los nuevos mapas en red de la organización | Innova block IIC - February 19, 2015

    […] incluso, para identificar y analizar el surgimiento de nuevos modos de acción fraudulenta dentro de todo un ecosistema de […]

  6. Here Be Dragons: los nuevos mapas en red de la organización - IIC - January 26, 2016

    […] incluso, para identificar y analizar el surgimiento de nuevos modos de acción fraudulenta dentro de todo un ecosistema de […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: