comparision of various Markov chain Monte Carlo algorithms in SWI Prolog
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
This repo is archived. You can view files and clone it, but cannot push or open issues/pull-requests.
 
 
 
Franco Masotti f8704a6b3c Removed useless function. 3 years ago
src Removed useless function. 3 years ago
.gitignore Fixes. 3 years ago
.gitmodules Fixed submodule. 4 years ago
LICENSE Initial commit 4 years ago
README.md Fixed readme. 3 years ago

README.md

mcmc-comparisons

comparision of various Markov chain Monte Carlo algorithms

Table of contents

Purpose

Compare different kinds of MCMC and AMCMC sampling algorithms to find out which ones are faster.

Method

The independent variable is the number of samples, i.e. the number of steps to approximate the distribution. What we want to measure are the dependent variables: time and probability.

Experiments are repeated on a different processor thread using the same input parameters. The results of these experiments are grouped by sample size so it is possible to compute average running times, average probabilities and their standard deviations. Plotting is carried out after these statistical computations.

Dependencies for the installation phase

The following packages are needed only during the installation phase. You may remove them after the installation if not needed.

Package Executable Version command Package version
git /bin/git $ git --version git version 2.20.1
GCC, the GNU Compiler Collection /bin/gcc $ gcc --vserion gcc (GCC) 8.2.1 20181127
GNU Make /bin/make $ make --version GNU Make 4.2.1
GNU Tar /bin/tar $ tar --version tar (GNU tar) 1.30
curl /bin/curl $ curl --version curl 7.63.0 (x86_64-pc-linux-gnu) libcurl/7.63.0 OpenSSL/1.1.1a zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) libssh2/1.8.0 nghttp2/1.35.1

Runtime dependencies

You need to install the following packages and the ones listed for fbopt:

Package Executable Version command Package version
SWI prolog /bin/swipl $ swipl --version SWI-Prolog version 7.7.19 for x86_64-linux
XSB Prolog <Depends on installation> $ xsb --version XSB Version 3.8.0 (Three-Buck Chuck) of October 28, 2017
Python 3 /bin/python3 $ python3 --version Python 3.7.1
Matplotlib <Python 3 library> - 3.0.2
NumPy <Python 3 library> - 1.15.4
GNU Parallel /bin/parallel $ parallel --version parallel 20181222
cplint <SWI Prolog library> - v4.5.0

Installation

Arch Linux based distros

# pacman -S git gcc make tar curl
# pacman -S swi-prolog python3 python-matplotlib python-numpy parallel
$ swipl
?- pack_install(cplint).
$ mkdir --parents ~/build
$ cd ~/build
$ curl --output XSB.tar.gz http://xsb.sourceforge.net/downloads/XSB.tar.gz 
$ tar -xvzf XSB.tar.gz
$ cd XSB/build
$ ./configure
$ ./makexsb
$ cd ~
$ echo "alias xsb=\"~/build/XSB/bin/xsb\"" >> ~/.bashrc

Repository cloning

Since this repository contains submodules you need to clone it using the appropriate flag:

$ git clone --recurse-submodules https://github.com/frnmst/mcmc-comparisons.git

Running

Help

Usage: run.sh [OPTIONS]
Run MCMC experiments

Mandatory arguments to long options are mandatory for short options too.
Options:
    --first-experiment-only             keep the first experiment and discard all
                                        the others while plotting. The CSV file
                                        will remain unaltered
    -g, --graph                         run the plot script after the experiments
    --graph-only                        run the plot script only
    -h, --help                          print this help
    --list-experiment-names             list the available experiment
    --list-experiment-types             list the available experiment types
    -m, --min=MIN                       starting number of samples
    --multi-switch=RESAMPLINGPROB       enable multi switch instead of single
                                        switch for AMCMC XSB experiments with the
                                        specified resampling probability. This
                                        number goes from 0.0 to 1.0
    -M, --max=MAX                       ending number of samples
    -p, --parallel                      execute experiments on separate computing
                                        threads. If this option is enabled the
                                        number of threads is determined by the
                                        '--threads' option. If this option is
                                        disabled, the runs are executed
                                        consecutively
    --single-run-with-label=LABEL       run a single experiment with the specified
                                        run label. This option excludes
                                        both the '--parallel' option and the
                                        '--runs' option
    --print-flags                       print the enabled options. This can also
                                        be used to print the default options
    -r, --runs=RUNS, --threads=RUNS     the number of runs or computing threads.
                                        See the '--parallel' option for more
                                        information
    --memory=MEMORYMB                   the amount om memory in MB assigned
                                        to the SLURM job
    --no-remove-csv-files               avoid removing all csv files before
                                        running a experiment. This option defaults
                                        to false in all cases except when run
                                        with the '--graph-only' option. In that
                                        case the value of this option is
                                        fixed to true and cannot be changed.
                                        Set this option if you want to keep old
                                        results and appending new ones to the
                                        same file. Normally, you should not set
                                        this option
    --output-file                       use custom output filename.
                                        Defaults to ""${experiment_name}".csv"
                                        where "${experiment_name}" is given
                                        by the '--experiment-name' option
    --partition=PARTITIONNAME           the partition name used for the
                                        SLURM job
    -s, --steps=STEPS                   the number of samples between
                                        consecutive iterations
    --swi-vs-xsb                        compare SWI and XSB experiments. In
                                        this case you need to specify the experiment
                                        names separating them with a colon
                                        character and specifing the SWI then
                                        XSB the experiment name, as the argument
                                        of the '--experiment-name' option:
                                        'experiment_name_swi:experiment_name_xsb'
    -S, --slurm                         run the experiments with the SLURM system.
                                        This disables the '--graph', '--parallel'
                                        options and implies the
                                        '--single-run-with-label' option
    -t, --experiment-name=NAME          the name of the experiment
    -y, --experiment-type=NAME          the type of the experiment

Exit status:
 0  if OK,
 1  if an error occurred.

This software is released under the BSD 2-Clause license
Copyright (c) 2018-2019, Franco Masotti

Run locally

  • Go to the local directory

    $ cd ./src/local
    
  • Execute the run.sh script, for example like this:

    $ ./run.sh -p -g
    

Run on a SLURM queue

The purpose of using SLURM is to run the experiments is a multi node setup.

Before continuing you must start the daemons

$ cd ./includes
# ./run_slurm_daemons.sh

To run an experiment you must simply use the slurm option, like this

$  ./run.sh -S

Plot

The plot needs to be rendered manually like this

$ ./run.sh --graph-only -t "${experiment_name}"

Configuration file

A configuration file for SLURM is available at ./src/includes/slurm.conf. This is the configuration file I use for my setup.

Note: enforcement of memory limits is disabled in the provided configuration file. See also the --mem option at https://slurm.schedmd.com/sbatch.html

CSV file format

The results of each test will be logged to a corresponding CSV file with the following line format, depending on the type of experiment:

run_number,current_samples,mh_time,mh_probability,gibbs_time,gibbs_probability\n

or

run_number,current_samples,adapt_on_time,adapt_on_probability,adapt_off_time,adapt_off_probability\n

or

run_number,current_samples,mh_time,mh_probability,gibbs_time,gibbs_probability,rejection_time,rejection_probability\n

where \n is the newline character (ASCII code 10).

mh_time,gibbs_time,adapt_on_time,adapt_off_time are computed in milliseconds.

Plot

The tools necessary to plot the results are Python 3 and Matplotlib.

The plot script called plot_comparison.py can be found under the ./src/plot directory and it plots:

  1. the running time over samples
  2. the probability over samples, to determine the accuracy of the calculations
  3. the probability over running time, to determine convergence

Normally, the average of multiple runs is done for plots 1 and 2. Error bars represent the standard deviation of the values. It is also possible to ignore the average and consider only the first run via the --first-experiment-only option.

Notes on running the experiments

Sequential version

What follows is a pseudocode scheme for the sequential version of one of the experiments

for j = 0, j < $runs, j++:
    for i = $min, i < $max, i += $step:
        samples = i
        time_mh = measure_mh(samples)
        time_gibbs = measure_gibbs(samples)
        print(samples, time_mh, time_gibbs)

Output files

When running the experiments in parallel the result of each iteration is written on the same output file thus producing an unsorted output. To fix this you may run the following shell commands

for output in $OUTPUTS; do
    # Sort lines by run number and and keep sample id in place so that we maintain the correct
    # input for the python script.
    cat "${output}" | tr ',' ' ' | sort -g -s -k 1 | tr ' ' ',' > "${output}".baak
    mv "${output}" "${output}".bak
    mv "${output}".baak "${output}"
done

plot_comparison.py however sorts the data internally so if you use that script there is not need to fix the file by hand.

An alternative to GNU Parallel

Use background processes:

declare -a job_id
for i in $(seq 1 ${5}); do
    swipl -s experiments ${2} ${3} ${4} ${i} 1 &
    job_id=("${job_id[@]}" $!)
done
wait ${job_id[@]}

The hmm experiment

Before running this experiment you need to patch the file with git:

$ pushd ./src/prolog && git -C swish/ am --signoff ../fix_hmm.patch && popd

This patch has been optained using a modified version of hmm.pl and this command:

$ git format-patch HEAD~1..HEAD  --stdout > fix_hmm.patch

References

Algorithms

SLURM

License

See the LICENSE file.