Scripts

We have a bunch of python scripts to execute common tasks.

Note

During development, you can use ./develop.sh manage ... to execute the commands. In production, the command should be run inside the appropriate container using python manage.py ....

ListenBrainz

These commands are helpful in running a ListenBrainz development instance and some other miscellaneous tasks.

./develop.sh manage

./develop.sh manage [OPTIONS] COMMAND [ARGS]...

add_missing_to_listen_users_metadata

./develop.sh manage add_missing_to_listen_users_metadata [OPTIONS]

delete_listens_and_update_metadata

Complete all pending listen deletes and also run update script for updating listen metadata since last cron run

./develop.sh manage delete_listens_and_update_metadata [OPTIONS]

delete_pending_listens

Complete all pending listen deletes since last cron run

./develop.sh manage delete_pending_listens [OPTIONS]

init_db

Initializes database.

This process involves several steps:
  1. Table structure is created.
  2. Primary keys and foreign keys are created.
  3. Indexes are created.
./develop.sh manage init_db [OPTIONS]

Options

-f, --force

Drop existing database and user.

--create-db

Create the database and user.

init_msb_db

Initializes database.

This process involves several steps:
  1. Table structure is created.
  2. Primary keys and foreign keys are created.
  3. Indexes are created.
./develop.sh manage init_msb_db [OPTIONS]

Options

-f, --force

Drop existing database and user.

--create-db

Skip creating database and user. Tables/indexes only.

init_ts_db

Initializes database.

This process involves several steps:
  1. Table structure is created.
  2. Indexes are created.
  3. Views are created
./develop.sh manage init_ts_db [OPTIONS]

Options

-f, --force

Drop existing database and user.

--create-db

Create the database and user.

listen-add-userid

Fill in the listen.user_id field based on user_name.

./develop.sh manage listen-add-userid [OPTIONS]

notify_yim_users

./develop.sh manage notify_yim_users [OPTIONS]

recalculate_all_user_data

Recalculate all user timestamps and listen counts.

Note

ONLY USE THIS WHEN YOU KNOW WHAT YOU ARE DOING!

./develop.sh manage recalculate_all_user_data [OPTIONS]

run_websockets

./develop.sh manage run_websockets [OPTIONS]

Options

-h, --host <host>
Default:0.0.0.0
-p, --port <port>
Default:7082
-d, --debug

Turns debugging mode on or off. If specified, overrides ‘DEBUG’ value in the config file.

set_rate_limits

./develop.sh manage set_rate_limits [OPTIONS] PER_TOKEN_LIMIT PER_IP_LIMIT
                                    WINDOW_SIZE

Arguments

PER_TOKEN_LIMIT

Required argument

PER_IP_LIMIT

Required argument

WINDOW_SIZE

Required argument

submit-release

Submit a release from MusicBrainz to the local ListenBrainz instance

Specify -u to use the token of this user when submitting, or -t to specify a specific token.

./develop.sh manage submit-release [OPTIONS] RELEASEMBID

Options

-u, --user <user>
-t, --token <token>

Arguments

RELEASEMBID

Required argument

update_user_emails

./develop.sh manage update_user_emails [OPTIONS]

update_user_listen_data

Scans listen table and update listen metadata for all users

./develop.sh manage update_user_listen_data [OPTIONS]

Dump Manager

These commands are used to export and import dumps.

./develop.sh manage dump

./develop.sh manage dump [OPTIONS] COMMAND [ARGS]...

check_dump_ages

Check to make sure that data dumps are sufficiently fresh. Send mail if they are not.

./develop.sh manage dump check_dump_ages [OPTIONS]

create_feedback

Create a spark formatted dump of user/recommendation feedback data.

./develop.sh manage dump create_feedback [OPTIONS]

Options

-l, --location <location>

path to the directory where the dump should be made

-t, --threads <threads>

the number of threads to be used while compression

create_full

Create a ListenBrainz data dump which includes a private dump, a statistics dump and a dump of the actual listens from the listenstore.

Args:
location (str): path to the directory where the dump should be made threads (int): the number of threads to be used while compression dump_id (int): the ID of the ListenBrainz data dump do_listen_dump: If True, make a listens dump do_spark_dump: If True, make a spark listens dump do_db_dump: If True, make a public/private postgres dump do_timescale_dump: If True, make a public/private timescale dump
./develop.sh manage dump create_full [OPTIONS]

Options

-l, --location <location>

path to the directory where the dump should be made

-t, --threads <threads>

the number of threads to be used while compression

--dump-id <dump_id>

the ID of the ListenBrainz data dump

--listen, --no-listen
--spark, --no-spark
--db, --no-db
--timescale, --no-timescale

create_incremental

./develop.sh manage dump create_incremental [OPTIONS]

Options

-l, --location <location>
-t, --threads <threads>
--dump-id <dump_id>

create_parquet

./develop.sh manage dump create_parquet [OPTIONS]

delete_old_dumps

./develop.sh manage dump delete_old_dumps [OPTIONS] LOCATION

Arguments

LOCATION

Required argument

import_dump

Import a ListenBrainz dump into the database.

Args:
private_archive (str): the path to the ListenBrainz private dump to be imported private_timescale_archive (str): the path to the ListenBrainz private timescale dump to be imported public_archive (str): the path to the ListenBrainz public dump to be imported public_timescale_archive (str): the path to the ListenBrainz public timescale dump to be imported listen_archive (str): the path to the ListenBrainz listen dump archive to be imported threads (int): the number of threads to use during decompression, defaults to 1

Note

This method tries to import the private db dump first, followed by the public db dump. However, in absence of a private dump, it imports sanitized versions of the user table in the public dump in order to satisfy foreign key constraints. Then it imports the listen dump.

./develop.sh manage dump import_dump [OPTIONS]

Options

-pr, --private-archive <private_archive>

the path to the ListenBrainz private dump to be imported

--private-timescale-archive <private_timescale_archive>

the path to the ListenBrainz private timescale dump to be imported

-pu, --public-archive <public_archive>

the path to the ListenBrainz public dump to be imported

--public-timescale-archive <public_timescale_archive>

the path to the ListenBrainz public timescale dump to be imported

-l, --listen-archive <listen_archive>

the path to the ListenBrainz listen dump archive to be imported

-t, --threads <threads>

the number of threads to use during decompression, defaults to 1

import_yim_playlists

Import playlist excerpts into the YIM data table from a dump file.

Note

First copy the dump to inside the container from which the script is to be run.

Args:
patch_slug (str): The slug of the troi patch that generated these playlists. dump_file (str): The dump file to import. For each user, it should contain three lines: user_name, playlist_mbid, JSPF data.
./develop.sh manage dump import_yim_playlists [OPTIONS] PATCH_SLUG DUMP_FILE

Arguments

PATCH_SLUG

Required argument

DUMP_FILE

Required argument

ListenBrainz Spark

These commands are used to interact with the Spark Cluster.

python spark_manage.py

python spark_manage.py [OPTIONS] COMMAND [ARGS]...

request_consumer

Invoke script responsible for the request consumer

python spark_manage.py request_consumer [OPTIONS]

./develop.sh manage spark

./develop.sh manage spark [OPTIONS] COMMAND [ARGS]...

cron_request_all_stats

./develop.sh manage spark cron_request_all_stats [OPTIONS]

cron_request_recommendations

./develop.sh manage spark cron_request_recommendations [OPTIONS]

cron_request_similar_users

./develop.sh manage spark cron_request_similar_users [OPTIONS]

request_candidate_sets

Send the cluster a request to generate candidate sets.

./develop.sh manage spark request_candidate_sets [OPTIONS]

Options

--days <days>

Request recommendations to be generated on history of given number of days

--top <top>

Calculate given number of top artist.

--similar <similar>

Calculate given number of similar artist.

--html

Enable/disable HTML file generation

--user-name <users>

Generate candidate set for given users. Generate for all active users by default.

request_dataframes

Send the cluster a request to create dataframes.

./develop.sh manage spark request_dataframes [OPTIONS]

Options

--days <days>

Request model to be trained on data of given number of days

--job-type <job_type>

The type of dataframes to request. ‘recommendation_recording’ or ‘similar_users’ are allowed.

--listens-threshold <listens_threshold>

The minimum number of listens a user should have to be included in the dataframes.

request_import_artist_relation

Send the spark cluster a request to import artist relation.

./develop.sh manage spark request_import_artist_relation [OPTIONS]

request_import_full

Send the cluster a request to import a new full data dump

./develop.sh manage spark request_import_full [OPTIONS]

Options

--id <id_>

Optional. ID of the full dump to import, defaults to latest dump available on FTP server

request_import_incremental

Send the cluster a request to import a new incremental data dump

./develop.sh manage spark request_import_incremental [OPTIONS]

Options

--id <id_>

Optional. ID of the incremental dump to import, defaults to latest dump available on FTP server

request_import_musicbrainz_release_dump

Send the spark cluster a request to import musicbrainz release dump.

./develop.sh manage spark request_import_musicbrainz_release_dump 
    [OPTIONS]

request_listens_per_day

Send request to calculate listens per day stat to the spark cluster

./develop.sh manage spark request_listens_per_day [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_missing_mb_data

Send the cluster a request to generate missing MB data.

./develop.sh manage spark request_missing_mb_data [OPTIONS]

Options

--days <days>

Request missing musicbrainz data based on listen data of given number of days

request_model

Send the cluster a request to train the model.

For more details refer to https://spark.apache.org/docs/2.1.0/mllib-collaborative-filtering.html

./develop.sh manage spark request_model [OPTIONS]

Options

--rank <rank>

Number of hidden features

--itr <itr>

Number of iterations to run.

--lmbda <lmbda>

Controls over fitting.

--alpha <alpha>

Baseline level of confidence weighting applied.

--use-transformed-listencounts

Whether to apply a transformation function on the listencounts or use original listen playcounts

request_recommendations

Send the cluster a request to generate recommendations.

./develop.sh manage spark request_recommendations [OPTIONS]

Options

--top <top>

Generate given number of top artist recommendations

--similar <similar>

Generate given number of similar artist recommendations

--raw <raw>

Generate given number of raw recommendations

--user-name <users>

Generate recommendations for given users. Generate recommendations for all users by default.

request_recording_discovery

Send the cluster a request to generate recording discovery data.

./develop.sh manage spark request_recording_discovery [OPTIONS]

request_similar_users

Send the cluster a request to generate similar users.

./develop.sh manage spark request_similar_users [OPTIONS]

Options

--max-num-users <max_num_users>

The maxiumum number of similar users to return for any given user.

request_sitewide_stats

Send request to calculate sitewide stats to the spark cluster

./develop.sh manage spark request_sitewide_stats [OPTIONS]

Options

--type <type_>

Required Type of statistics to calculate

Options:entity | listening_activity
--range <range_>

Required Time range of statistics to calculate

Options:week | month | quarter | half_yearly | year | all_time | this_week | this_month | this_year
--entity <entity>

Entity for which statistics should be calculated

Options:artists | releases | recordings

request_user_stats

Send a user stats request to the spark cluster

./develop.sh manage spark request_user_stats [OPTIONS]

Options

--type <type_>

Required Type of statistics to calculate

Options:entity | listening_activity | daily_activity
--range <range_>

Required Time range of statistics to calculate

Options:week | month | quarter | half_yearly | year | all_time | this_week | this_month | this_year
--entity <entity>

Entity for which statistics should be calculated

Options:artists | releases | recordings

request_year_in_music

Send the cluster a request to generate all year in music statistics.

./develop.sh manage spark request_year_in_music [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yearly_listen_count

Send request to calculate yearly listen count stat to the spark cluster

./develop.sh manage spark request_yearly_listen_count [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yim_day_of_week

Send request to calculate most listened day of week to the spark cluster

./develop.sh manage spark request_yim_day_of_week [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yim_most_listened_year

Send request to calculate most listened year stat to the spark cluster

./develop.sh manage spark request_yim_most_listened_year [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yim_most_prominent_color

Send request to calculate most prominent color stat to the spark cluster

./develop.sh manage spark request_yim_most_prominent_color 
    [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yim_new_release_stats

Send request to calculate new release stats to the spark cluster

./develop.sh manage spark request_yim_new_release_stats [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yim_similar_users

Send the cluster a request to generate similar users for Year in Music.

./develop.sh manage spark request_yim_similar_users [OPTIONS]

Options

--year <year>

Year for which to calculate the stat

request_yim_top_stats

Send request to calculate top stats to the spark cluster

./develop.sh manage spark request_yim_top_stats [OPTIONS]

Options

--year <year>

Year for which to calculate the stat