Raw data formats

This section briefly discusses the relevant internal data formats. Most of the source data, namely, instance data and raw QPU logs, are saved in JSON format. Intermediate computation results are usually stored as comma separated values (i.e., plain text tables in .csv files).

Problem instances

For each problem, we store an original instance data (in πŸ“ instances/orig folder) and a respective QUBO formulation (in πŸ“ instances/QUBO folder). These two files have the same basename and different suffixes: for example, TSP instance with TSP1 is represented by two files:

  • ./instances/orig/TSP1_5_pr107.orig.json with original instance data, and

  • ./instances/QUBO/TSP1_16_5_pr107.qubo.json with QUBO formulation.

Both files are in the standard JSON format, which can be parsed by json python package or, for example, jq command line utility. Besides jq command, one can use any JSON editor/viewer for visual inspection, one notable example being the standard JSON viewer built into Firefox browser.

Below we specify the structure of the respective JSON files.

QUBO formulations

Filenames: ./instances/QUBO/*.qubo.json*

JSON files corresponding to QUBO formulations have universal format, regardless of the problem type:

Field

Description

Q

quadratic coefficients matrix

P

linear coefficients vector

Const

constant (a number)

description

metadata in subfields:

β”– instance_id

a unique instance ID

β”– instance_type

TSP, UDMIS, or MWC

β”– original_instance_name

original instance name (e.g., for TSP β€” from TSP Lib)

β”– original_instance_file

filename for the original instance

β”– contents

constant value QUBO

β”– comment

a free-form string comment.

Note that internally in the code, we assume the following QUBO format:

\[\min \frac{1}{2} x^\prime Q x + x^\prime P + \text{Const}\]

TSP instances.

Filenames: instances/orig/TSP*.orig.json.

TSP instances are generated from the original TSPLIB instances. Namely, in our dataset we have the instances sampled from the following collection of TSPLIB instances:

att48, brazil58, eil101, gr666, hk48, kroA100, kroB100, kroC100, lin105,
pa561.tsp, pr107, pr299, rat575, swiss42, tsp225.

Each original instance file (present in πŸ“ instances/orig folder) has the following structure:

Field

Description

D

distance matrix

description

metadata in subfields:

β”– instance_id

unique instance ID

β”– instance_type

value TSP

β”– original_instance_name

reference to the original instance (from TSPLIB)

β”– contents

value Distance matrix D.

β”– comments

a free-form string comment.

MWC (MaxCut) instances.

Filenames: instances/orig/MWC*.json

Field

Description

nodes

a list of node IDs (numbers)

edges

list of tuples (one per edge):

β”– (int)

node id: edge tail

β”– (int)

node id: edge head

β”– (float)

edge weight

description

metadata in subfields:

β”– instance_id

a unique instance ID

β”– instance_type

value MWC

β”– original_instance_name

original instance name (N<nodes>E<edges>_ERG_p<P>

β”– contents

value orig_MWC_G

β”– comment

a free-form string comment.

Note that in the original_instance_name, the parts N and E denote number of nodes and edges, respectively while p stands for the random graph model parameter for edge probabilities (in Erdos-Renyi model).

UD-MIS instances

Filenames: instances/orig/UDMIS*.json

Field

Description

nodes

nodes in the graph

β”– list[int]

(list of integer labels)

edges

list of edges

β”– tuple (int, int)

(pairs of node labels)

description

metadata in subfields:

β”– instance_id

a unique instance ID

β”– instance_type

value UDMIS

β”– original_instance_name

original instance name (N<nodes>W<width to height>_R<R / size>

β”– contents

value orig_UDMIS

β”– wwidth

Max x-coordinate of a point (for generation)

β”– wheight

Max y-coordinate of a point (for generation)

β”– R

Radius parameter (for generation)

β”– points

Points corresponding to vertices:

β”– β€œ(node_id)”: (x, y)

a dict of point coordinates (x,y) keyed by by the respective node ID.

β”– comment

A free-form string comment.

QPU run logs

Raw QPU run logs also constitute JSON files, however, the format is relatively involved, as we tried to preserve as much data from each QPU run as possible. Specific fields from the raw log files that were used in our analysis can be devised from the log parsing source code, namely, the following functions:

Computed summaries

Intermediary summary tables in πŸ“ run_logs folder, including the QPU shots data in run_logs/*/samples-csv essentially always constitute plain text tables with comma separated values, which can be easily manipulated with pandas (in Python), dplyr (in R), or basically any spreadsheets software for quick visual inspection, such as LibreOffice.