How to Develop a Credit Card Fraud Detection Application using Memgraph, Flask, and D3.js

Image by author

Introduction

There is a large and ever-growing number of use cases for graph databases and many of them are centered around one important functionality: relationship traversals. While in traditional relational databases the concept of foreign keys seems like a simple and efficient idea, the truth is that they result in very complex joins and self-joins when the dataset becomes too inter-related.

Prerequisites

Since you will be building a complete web application there is a number of tools that you will need to install before getting started:

  • Flask: a very powerful web framework that provides you with tools, libraries, and technologies used in web development. A Flask application can be as small as a single web page or as complex as a management interface.
  • Docker and Compose: an open platform for developing, shipping, and running applications. It enables you to separate your application from your infrastructure (host machine). If you are installing Docker on Windows, Compose will be already included. For Linux and macOS visit this site.
  • Memgraph DB: a native fully distributed in-memory graph database built to handle real-time use-cases at an enterprise scale. Follow the Docker Installation instructions. While it’s completely optional, I encourage you to also install Memgraph Lab so you can execute Cypher queries on the database directly and see visualized results.

Understanding the Payment Fraud Detection Scenario

First, let’s define all the roles in this scenario:

  • Card — a credit card used for payment.
  • POS — a point of sale device that uses a card to execute transactions.
  • Transaction — a stored instance of buying something.
Image by author

Defining the Graph Schema

After we defined the scenario, it’s time to create the graph schema!

Image by author
(:Card {compromised:false})<-[:USING]-(:Transaction)-[:AT]->(:Pos {compromised: false})
(:Card {compromised:true})<-[:USING]-(:Transaction)-[:AT]->(:Pos {compromised: true})
(:Card {compromised:true})<-[:USING]-(:Transaction {fraudReported:true})-[:AT]->(:Pos)
Image by author

Building the Web Application Backbone

This is presumably the easy part. You need to create a simple Python web application using Flask to be your server. Let’s start by creating a root directory for your project and naming it card_fraud. There you need to create a requirements.txt file containing the necessary PIP installs. For now, only one line is needed:

Flask==1.1.2
pip3 install -r requirements.txt
from flask import Flask

app = Flask(__name__)


@app.route('/')
@app.route('/index')
def index():
return "Hello World"
export FLASK_APP=card_fraud.py
export FLASK_ENV=development
flask run --host 0.0.0.0
* Serving Flask app "card_fraud"
* Running on http://0.0.0.0:5000/

Dockerizing the Application

In the root directory of the project create two files, Dockerfile and docker-compose.yml. At the beginning of the Dockerfile, you specify the parent image and instruct the container to install CMake, mgclient, and pymgclient. CMake and mgclient are necessary to install pymgclient, the Python driver for Memgraph DB.

FROM python:3.8

# Install CMake
RUN apt-get update && \
apt-get --yes install cmake

# Install mgclient
RUN apt-get install -y git cmake make gcc g++ libssl-dev && \
git clone https://github.com/memgraph/mgclient.git /mgclient && \
cd mgclient && \
git checkout dd5dcaaed5d7c8b275fbfd5d2ecbfc5006fa5826 && \
mkdir build && \
cd build && \
cmake .. && \
make && \
make install

# Install pymgclient
RUN git clone https://github.com/memgraph/pymgclient /pymgclient && \
cd pymgclient && \
python3 setup.py build && \
python3 setup.py install

# Install packages
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

COPY card_fraud.py /app/card_fraud.py
WORKDIR /app

ENV FLASK_ENV=development
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

ENTRYPOINT ["python3", "card_fraud.py"]
version: "3"
services:
memgraph:
image: "memgraph"
ports:
- "7687:7687"
card_fraud:
build: .
volumes:
- .:/app
ports:
- "5000:5000"
environment:
MG_HOST: memgraph
MG_PORT: 7687
depends_on:
- memgraph
card_fraud
├── card_fraud.py
├── docker-compose.yml
├── Dockerfile
└── requirements.txt
docker-compose build
docker-compose up

Defining the Bussines Logic

At this point, you have a basic web server and a database instance. It’s time to add some useful functionalities to your app. To communicate with the database, your app needs some kind of OGM — Object Graph Mapping system. You can just reuse this one: custom OGM. Add the database directory with all of its contents to the root directory of your project.

import os

MG_HOST = os.getenv('MG_HOST', '127.0.0.1')
MG_PORT = int(os.getenv('MG_PORT', '7687'))
MG_USERNAME = os.getenv('MG_USERNAME', '')
MG_PASSWORD = os.getenv('MG_PASSWORD', '')
MG_ENCRYPTED = os.getenv('MG_ENCRYPT', 'false').lower() == 'true'
import logging
import time

log = logging.getLogger(__name__)

def init_log():
logging.basicConfig(level=logging.INFO)
log.info("Logging enabled")
logging.getLogger("werkzeug").setLevel(logging.WARNING)

init_log()
from argparse import ArgumentParser

def parse_args():
'''
Parse command-line arguments.
'''
parser = ArgumentParser(description=__doc__)
parser.add_argument("--app-host", default="0.0.0.0",
help="Allowed host addresses.")
parser.add_argument("--app-port", default=5000, type=int,
help="App port.")
parser.add_argument("--template-folder", default="public/template",
help="The folder with flask templates.")
parser.add_argument("--static-folder", default="public",
help="The folder with flask static files.")
parser.add_argument("--debug", default=True, action="store_true",
help="Run web server in debug mode")
parser.add_argument('--clean-on-start', action='store_true',
help='Should the DB be emptied on script start')
print(__doc__)
return parser.parse_args()


args = parse_args()
from flask import Flask, Response, request, render_template
from database import Memgraph

db = Memgraph(host=MG_HOST, port=MG_PORT, username=MG_USERNAME,
password=MG_PASSWORD, encrypted=MG_ENCRYPTED)
app = Flask(__name__,
template_folder=args.template_folder,
static_folder=args.static_folder,
static_url_path='')

Clearing the Database

You need to start with an empty database so let’s implement a function to drop all the existing data from it:

def clear_db():
"""Clear the database."""

db.execute_query("MATCH (n) DETACH DELETE n")
log.info("Database cleared")

Adding Initial Cards and POS Devices

There is a fixed number of initial cards and POS devices that need to be added to the database at the beginning.

def init_data(card_count, pos_count):
"""Populate the database with initial Card and POS device entries."""

log.info("Initializing {} cards and {} POS devices".format(
card_count, pos_count))
start_time = time.time()

db.execute_query("UNWIND range(0, {} - 1) AS id "
"CREATE (:Card {{id: id, compromised: false}})".format(
card_count))
db.execute_query("UNWIND range(0, {} - 1) AS id "
"CREATE (:Pos {{id: id, compromised: false}})".format(
pos_count))

log.info("Initialized data in %.2f sec", time.time() - start_time)

Adding a Single Compromised POS Device

You need the option of changing the property compromised of a POS device to true given that all of them are initialized as false at the beginning.

def compromise_pos(pos_id):
"""Mark a POS device as compromised."""

db.execute_query(
"MATCH (p:Pos {{id: {}}}) SET p.compromised = true".format(pos_id))
log.info("Point of sale %d is compromised", pos_id)

Adding Multiple Random Compromised POS Devices

You can also compromise a set number of randomly selected POS devices at once.

from random import sample

def compromise_pos_devices(pos_count, fraud_count):
"""Compromise a number of random POS devices."""

log.info("Compromising {} out of {} POS devices".format(
fraud_count, pos_count))
start_time = time.time()

compromised_devices = sample(range(pos_count), fraud_count)
for pos_id in compromised_devices:
compromise_pos(pos_id)

log.info("Compromisation took %.2f sec", time.time() - start_time)

Adding Credit Card Transactions

This is where the main analysis for fraud detection happens. If the POS device is compromised, then the card in the transaction gets compromised too. If the card is compromised, there is a 0.1% chance the transaction is fraudulent and detected (regardless of the POS device).

from random import randint

def pump_transactions(card_count, pos_count, tx_count, report_pct):
"""Create transactions. If the POS device is compromised,
then the card in the transaction gets compromised too.
If the card is compromised, there is a 0.1% chance the
The transaction is fraudulent and detected (regardless of
the POS device)."""

log.info("Creating {} transactions".format(tx_count))
start_time = time.time()

query = ("MATCH (c:Card {{id: {}}}), (p:Pos {{id: {}}}) "
"CREATE (t:Transaction "
"{{id: {}, fraudReported: c.compromised AND (rand() < %f)}}) "
"CREATE (c)<-[:Using]-(t)-[:At]->(p) "
"SET c.compromised = p.compromised" % report_pct)

def rint(max): return randint(0, max - 1)
for i in range(tx_count):
db.execute_query(query.format(rint(card_count),
rint(pos_count),
i))

duration = time.time() - start_time
log.info("Created %d transactions in %.2f seconds", tx_count, duration)

Resolving Transactions and Cards on a POS Device

You also need to have the functionality to resolve suspected fraud cases. This means marking all the connected components of a POS device as not compromised if they are cards and not fraudulent if they are transactions. This function is triggered by a POST request to the URL /resolve-pos. The request body contains the variable pos which specifies the id of the POS device.

import json

@app.route('/resolve-pos', methods=['POST'])
def resolve_pos():
"""Resolve a POS device and card as not compromised."""

data = request.get_json(silent=True)
start_time = time.time()

db.execute_query("MATCH (p:Pos {{id: {}}}) "
"SET p.compromised = false "
"WITH p MATCH (p)--(t:Transaction)--(c:Card) "
"SET t.fraudReported = false, c.compromised = false".format(data['pos']))

duration = time.time() - start_time
log.info("Compromised Point of sale %s has been resolved in %.2f sec",
data['pos'], duration)

response = {"duration": duration}
return Response(
json.dumps(response),
status=201,
mimetype='application/json')

Fetching all Compromised POS Devices

This function searches the database for all POS devices that have more than one fraudulent transaction connected to them. It’s is triggered by a GET request to the URL /get-compromised-pos.

@app.route('/get-compromised-pos', methods=['GET'])
def get_compromised_pos():
"""Get compromised POS devices."""

log.info("Getting compromised Point Of Service IDs")
start_time = time.time()

data = db.execute_and_fetch("MATCH (t:Transaction {fraudReported: true})-[:Using]->(:Card)"
"<-[:Using]-(:Transaction)-[:At]->(p:Pos) "
"WITH p.id as pos, count(t) as connected_frauds "
"WHERE connected_frauds > 1 "
"RETURN pos, connected_frauds ORDER BY connected_frauds DESC")
data = list(data)

log.info("Found %d POS with more then one fraud in %.2f sec",
len(data), time.time() - start_time)

return json.dumps(data)

Fetching all Fraudulent Transaction

With a very simple query, you can return all the transactions that are marked as fraudulent. The function is triggered by a GET request to the URL /get-fraudulent-transactions.

@app.route('/get-fraudulent-transactions', methods=['GET'])
def get_fraudulent_transactions():
"""Get fraudulent transactions."""

log.info("Getting fraudulent transactions")
start_time = time.time()

data = db.execute_and_fetch(
"MATCH (t:Transaction {fraudReported: true}) RETURN t.id as id")
data = list(data)

duration = time.time() - start_time
log.info("Found %d fraudulent transactions in %.2f",
len(data), duration)

response = {"duration": duration, "fraudulent_txs": data}
return Response(
json.dumps(response),
status=201,
mimetype='application/json')

Generating Demo Data

Your app will have an option to generate a specified number of cards, POS devices, and transactions, so you need a function that will be responsible for creating them and marking a number of them as compromised/fraudulent. It’s triggered by a POST request to the URL /generate-data. The request body contains the variables:

  • pos: specifies the number of the POS device.
  • frauds: specifies the number of compromised POS devices.
  • cards: specifies the number of the cards.
  • transactions: specifies the number of the transactions.
  • reports: specifies the number of reported transactions.
@app.route('/generate-data', methods=['POST'])
def generate_data():
"""Initialize the database."""

data = request.get_json(silent=True)

if data['pos'] < data['frauds']:
return Response(
json.dumps(
{'error': "There can't be more frauds than devices"}),
status=418,
mimetype='application/json')

start_time = time.time()

clear_db()
init_data(data['cards'], data['pos'])
compromise_pos_devices(data['pos'], data['frauds'])
pump_transactions(data['cards'], data['pos'],
data['transactions'], data['reports'])

duration = time.time() - start_time

response = {"duration": duration}
return Response(
json.dumps(response),
status=201,
mimetype='application/json')

Fetching POS device Connected Components

This function finds all the connected components of a compromised POS device and returns them to the client. It’s triggered by a POST request to the URL /pos-graph.

@app.route('/pos-graph', methods=['POST'])
def host():
log.info("Client fetching POS connected components")

request_data = request.get_json(silent=True)

data = db.execute_and_fetch("MATCH (p1:Pos)<-[:At]-(t1:Transaction {{fraudReported: true}})-[:Using] "
"->(c:Card)<-[:Using]-(t2:Transaction)-[:At]->(p2:Pos {{id: {}}})"
"RETURN p1, t1, c, t2, p2".format(request_data['pos']))
data = list(data)

output = []
for item in data:
p1 = item['p1'].properties
t1 = item['t1'].properties
c = item['c'].properties
t2 = item['t2'].properties
p2 = item['p2'].properties
print(p2)
output.append({'p1': p1, 't1': t1, 'c': c, 't2': t2, 'p2': p2})

return Response(
json.dumps(output),
status=200,
mimetype='application/json')

Rendering Views

These functions will return the requested view. More on them in the Client-side Logic section. They are triggered by GET requests to the URLs / and /graph.

@app.route('/', methods=['GET'])
def index():
return render_template('index.html')


@app.route('/graph', methods=['GET'])
def graph():
return render_template('graph.html',
pos=request.args.get('pos'),
frauds=request.args.get('frauds'))

Creating the Main Function

The function main() has three jobs:

  • Clear the database if so specified in the input arguments.
  • Create indexes for the nodes Card, Pos and Transaction. You can learn more about indexing here.
  • Start the Flask server with the specified arguments.
def main():

if args.clean_on_start:
clear_db()

db.execute_query("CREATE INDEX ON :Card(id)")
db.execute_query("CREATE INDEX ON :Pos(id)")
db.execute_query("CREATE INDEX ON :Transaction(fraudReported)")

app.run(host=args.app_host, port=args.app_port, debug=args.debug)


if __name__ == "__main__":
main()

Adding the Client-Side Logic

Now, that your server is ready, let’s create the client-side logic for your web application. I’m sure that you’re not here for a front-end tutorial and therefore I leave it up to you to experiment and get to know the individual components. Just copy this public directory with all of its contents to the root directory of your project and add the following code to the Dockerfile under the line RUN pip3 install -r requirements.txt:

COPY public /app/public
  • img: this directory contains images and animations.
  • js: this directory contains the JavaScript scripts.
  • graph.js: this script handles the graph.html page. It fetches all the connected components of a POS device, renders them in the form of a graph, and can resolve a POS device and all of its connected components as not fraudulent/compromised.
  • index.js: this script handles the index.html page. It initializes all of the necessary components, tells the server to generate the initial data, and fetches the fraudulent transactions.
  • render.js: this script handles the graph rendering on the graph.html page using the D3.js library.
  • libs: this directory contains all the locally stored libraries your application uses. For the purpose of this tutorial we only included the memgraph-design library to style your pages.
  • template: this directory contains the HTML pages.
  • graph.html: this is the page that renders a graph of a compromised POS device with all of its connected components.
  • index.html: this is the main page of the application. In it, you can generate new demo data and retrieve the compromised POS devices.

Starting the App

It’s time to test your app. First, you need to build the Docker image by executing:

docker-compose build
docker-compose up
Image by author

Conclusion

Relational database-management systems model data as a set of predetermined structures. Complex joins and self-joins are necessary when the dataset becomes too inter-related. Modern datasets require technically complex queries which are often very inefficient in real-time scenarios.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ivan Despot

Ivan Despot

Developer Relations Engineer and Computer science graduate. I am also a nerd about network science, web development, and anime!