Scripts

We have a bunch of python scripts to execute common tasks.

Note

During development, you can use ./develop.sh manage ... to execute the commands. In production, the command should be run inside the appropriate container using python manage.py ....

ListenBrainz

These commands are helpful in running a ListenBrainz development instance and some other miscellaneous tasks.

./develop.sh manage

./develop.sh manage [OPTIONS] COMMAND [ARGS]...

add_missing_to_listen_users_metadata

./develop.sh manage add_missing_to_listen_users_metadata [OPTIONS]

clear-expired-do-not-recommends

Delete expired do not recommend entries from database

./develop.sh manage clear-expired-do-not-recommends [OPTIONS]

delete-old-user-data-exports

Delete old and expired user data exports

./develop.sh manage delete-old-user-data-exports [OPTIONS]

delete_listens

Complete all pending listen deletes and also run update script for updating listen metadata since last cron run

./develop.sh manage delete_listens [OPTIONS]

delete_pending_listens

Complete all pending listen deletes since last cron run

./develop.sh manage delete_pending_listens [OPTIONS]

init_db

Initializes database.

This process involves several steps:

Table structure is created.
Primary keys and foreign keys are created.
Indexes are created.

./develop.sh manage init_db [OPTIONS]

Options

-f, --force: Drop existing database and user.

--create-db: Create the database and user.

init_ts_db

Initializes database.

This process involves several steps:

Table structure is created.
Indexes are created.
Views are created

./develop.sh manage init_ts_db [OPTIONS]

Options

-f, --force: Drop existing database and user.

--create-db: Create the database and user.

notify_yim_users

./develop.sh manage notify_yim_users [OPTIONS]

Options

--year <year>: Year for which to send the emails

recalculate_all_user_data

Recalculate all user timestamps and listen counts.

Note

ONLY USE THIS WHEN YOU KNOW WHAT YOU ARE DOING!

./develop.sh manage recalculate_all_user_data [OPTIONS]

refresh-top-manual-mappings

Refresh top manual msid-mbid mappings view

./develop.sh manage refresh-top-manual-mappings [OPTIONS]

run-daily-jams

Generate daily playlists for users soon after the new day begins in their timezone. This is an internal LB method and not a core function of troi.

./develop.sh manage run-daily-jams [OPTIONS]

Options

--create-all: Create the daily jams for all users. if false (default), only for users according to timezone.

run-metadata-cache-seeder

Query external services’ new releases api for new releases and submit those to our cache as seeds

./develop.sh manage run-metadata-cache-seeder [OPTIONS]

set_rate_limits

./develop.sh manage set_rate_limits [OPTIONS] PER_TOKEN_LIMIT PER_IP_LIMIT
                                    WINDOW_SIZE

Arguments

PER_TOKEN_LIMIT: Required argument

PER_IP_LIMIT: Required argument

WINDOW_SIZE: Required argument

submit-release

Submit a release from MusicBrainz to the local ListenBrainz instance

Specify -u to use the token of this user when submitting, or -t to specify a specific token.

./develop.sh manage submit-release [OPTIONS] RELEASEMBID

Options

-u, --user <user>

-t, --token <token>

Arguments

RELEASEMBID: Required argument

update-msid-tables

Scan tables using msids to find matching mbids from mapping tables and update them.

./develop.sh manage update-msid-tables [OPTIONS]

update_db

Updates the datbase by running the specified SQL file at FILENAME

./develop.sh manage update_db [OPTIONS] FILENAME

Arguments

FILENAME: Required argument

update_user_emails

./develop.sh manage update_user_emails [OPTIONS]

update_user_listen_data

Scans listen table and update listen metadata for all users

./develop.sh manage update_user_listen_data [OPTIONS]

Dump Manager

These commands are used to export and import dumps.

./develop.sh manage dump

./develop.sh manage dump [OPTIONS] COMMAND [ARGS]...

check_dump_ages

Check to make sure that data dumps are sufficiently fresh. Send mail if they are not.

./develop.sh manage dump check_dump_ages [OPTIONS]

create_feedback

Create a spark formatted dump of user/recommendation feedback data.

./develop.sh manage dump create_feedback [OPTIONS]

Options

-l, --location <location>: path to the directory where the dump should be made

-t, --threads <threads>: the number of threads to be used while compression

create_full

Create a ListenBrainz data dump which includes a private dump, a statistics dump and a dump of the actual listens from the listenstore.

./develop.sh manage dump create_full [OPTIONS]

Options

-l, --location <location>: path to the directory where the dump should be made

-lp, --location-private <location_private>: path to the directory where the private dumps should be made

-t, --threads <threads>: the number of threads to be used while compression

--dump-id <dump_id>: the ID of the ListenBrainz data dump

--listen, --no-listen: If True, make a listens dump

--spark, --no-spark: If True, make a spark listens dump

--db, --no-db: If True, make a public/private postgres dump

--timescale, --no-timescale: If True, make a public/private timescale dump

--stats, --no-stats: If True, make a couchdb stats dump

-lt, --location-temp <location_temp>: path to directory to use for creating necessary temporary files during dumps.

-lpt, --location-private-temp <location_private_temp>: path to directory to use for creating necessary temporary files during private dumps.

create_incremental

./develop.sh manage dump create_incremental [OPTIONS]

Options

-l, --location <location>

-t, --threads <threads>

--dump-id <dump_id>

create_mbcanonical

Create a dump of the canonical mapping tables. This includes the following items:

metadata for canonical recordings

canonical recording redirect

canonical release redirect

These tables are created by the mapping canonical-data management command.

If canonical-data is called with –use-lb-conn then the canonical metadata and recording redirect tables will be in the listenbrainz timescale database connection
If called with –use-mb-conn then all tables will be in the musicbrainz database connection.
The canonical release redirect table will always be in the musicbrainz database connection.

./develop.sh manage dump create_mbcanonical [OPTIONS]

Options

-l, --location <location>: path to the directory where the dump should be made

--use-lb-conn, --use-mb-conn: Dump the metadata table from the listenbrainz database

create_sample

Create a sample data dump.

./develop.sh manage dump create_sample [OPTIONS]

Options

-l, --location <location>: path to the directory where the dump should be made

-t, --threads <threads>: the number of threads to be used while compression

delete_old_dumps

./develop.sh manage dump delete_old_dumps [OPTIONS] LOCATION

Arguments

LOCATION: Required argument

import_dump

Import a ListenBrainz dump into the database.

Args:: private_archive (str): the path to the ListenBrainz private dump to be imported private_timescale_archive (str): the path to the ListenBrainz private timescale dump to be imported public_archive (str): the path to the ListenBrainz public dump to be imported public_timescale_archive (str): the path to the ListenBrainz public timescale dump to be imported listen_archive (str): the path to the ListenBrainz listen dump archive to be imported sample_archive (str): the path to the ListenBrainz sample dump archive to be imported threads (int): the number of threads to use during decompression, defaults to 1

Note

This method tries to import the private db dump first, followed by the public db dump. However, in absence of a private dump, it imports sanitized versions of the user table in the public dump in order to satisfy foreign key constraints. Then it imports the listen dump.

./develop.sh manage dump import_dump [OPTIONS]

Options

-pr, --private-archive <private_archive>: the path to the ListenBrainz private dump to be imported

--private-timescale-archive <private_timescale_archive>: the path to the ListenBrainz private timescale dump to be imported

-pu, --public-archive <public_archive>: the path to the ListenBrainz public dump to be imported

--public-timescale-archive <public_timescale_archive>: the path to the ListenBrainz public timescale dump to be imported

-l, --listen-archive <listen_archive>: the path to the ListenBrainz listen dump archive to be imported

-s, --sample-archive <sample_archive>: the path to the ListenBrainz sample dump archive to be imported

-t, --threads <threads>: the number of threads to use during decompression, defaults to 1

ListenBrainz Spark

These commands are used to interact with the Spark Cluster.

./develop.sh manage spark

./develop.sh manage spark [OPTIONS] COMMAND [ARGS]...

cron_request_all_stats

./develop.sh manage spark cron_request_all_stats [OPTIONS]

cron_request_popularity

./develop.sh manage spark cron_request_popularity [OPTIONS]

cron_request_recommendations

./develop.sh manage spark cron_request_recommendations [OPTIONS]

cron_request_similar_users

./develop.sh manage spark cron_request_similar_users [OPTIONS]

cron_request_similarity_datasets

./develop.sh manage spark cron_request_similarity_datasets 
    [OPTIONS]

request_compact_listens

Send a request to spark cluster to compact listens imported from listenbrainz

./develop.sh manage spark request_compact_listens [OPTIONS]

request_dataframes

Send the cluster a request to create dataframes.

./develop.sh manage spark request_dataframes [OPTIONS]

Options

--days <days>: Request model to be trained on data of given number of days

--job-type <job_type>: The type of dataframes to request. ‘recommendation_recording’ or ‘similar_users’ are allowed.

--listens-threshold <listens_threshold>: The minimum number of listens a user should have to be included in the dataframes.

request_entity_stats

Send an entity stats request to the spark cluster

./develop.sh manage spark request_entity_stats [OPTIONS]

Options

--type <type_>

Required Type of statistics to calculate

Options:: listeners

--range <range_>

Required Time range of statistics to calculate

Options:: this_week | this_month | this_year | week | month | quarter | year | half_yearly | all_time

--entity <entity>

Entity for which statistics should be calculated

Options:: artists | release_groups

--database <database>: Name of the couchdb database to store data in

request_fresh_releases

Send the cluster a request to generate release radar data.

./develop.sh manage spark request_fresh_releases [OPTIONS]

Options

--days <days>: Number of days of listens to consider for artist listening data

--database <database>: Name of the couchdb database to store data in

--threshold <threshold>: Number of days of listens to consider for artist listening data

request_import_deleted_listens

Send a request to spark cluster to import deleted listens from listenbrainz

./develop.sh manage spark request_import_deleted_listens [OPTIONS]

request_import_full

Send the cluster a request to import a new full data dump

./develop.sh manage spark request_import_full [OPTIONS]

Options

--id <id_>: Optional. ID of the full dump to import, defaults to latest dump available on FTP server

--use-local: Use local dump instead of FTP

request_import_incremental

Send the cluster a request to import a new incremental data dump

./develop.sh manage spark request_import_incremental [OPTIONS]

Options

--id <id_>: Optional. ID of the incremental dump to import, defaults to latest dump available on FTP server

--use-local: Use local dump instead of FTP

request_import_mlhd_dump

Send the spark cluster a request to import musicbrainz release dump.

./develop.sh manage spark request_import_mlhd_dump [OPTIONS]

request_import_pg_tables

Send the cluster a request to import metadata table from MB db postgres

./develop.sh manage spark request_import_pg_tables [OPTIONS]

request_import_sample

Send the cluster a request to import a sample dump

./develop.sh manage spark request_import_sample [OPTIONS]

request_missing_mb_data

Send the cluster a request to generate missing MB data.

./develop.sh manage spark request_missing_mb_data [OPTIONS]

Options

--days <days>: Request missing musicbrainz data based on listen data of given number of days

request_model

Send the cluster a request to train the model.

For more details refer to https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html

./develop.sh manage spark request_model [OPTIONS]

Options

--rank <rank>: Number of hidden features

--itr <itr>: Number of iterations to run.

--lmbda <lmbda>: Controls over fitting.

--alpha <alpha>: Baseline level of confidence weighting applied.

--use-transformed-listencounts: Whether to apply a transformation function on the listencounts or use original listen playcounts

request_per_artist_popularity

Request mlhd popularity data using the specified dataset.

./develop.sh manage spark request_per_artist_popularity [OPTIONS]

Options

--use-mlhd: Use MLHD+ data or ListenBrainz listens data

--entity <entity>

Required

Options:: recording | release | release_group

request_popularity

Request mlhd popularity data using the specified dataset.

./develop.sh manage spark request_popularity [OPTIONS]

Options

--use-mlhd: Use MLHD+ data or ListenBrainz listens data

--entity <entity>

Required

Options:: artist | recording | release | release_group

request_recommendations

Send the cluster a request to generate recommendations.

./develop.sh manage spark request_recommendations [OPTIONS]

Options

--raw <raw>: Generate given number of raw recommendations

--user-name <users>: Generate recommendations for given users. Generate recommendations for all users by default.

request_recording_discovery

Send the cluster a request to generate recording discovery data.

./develop.sh manage spark request_recording_discovery [OPTIONS]

request_similar_artists

Send the cluster a request to generate similar artists index.

./develop.sh manage spark request_similar_artists [OPTIONS]

Options

--days <days>: Required The number of days of listens to use.

--session <session>: Required The maximum duration in seconds between two listens in a listening session.

--contribution <contribution>: Required The maximum contribution a user’s listens can make to the similarity score of a artist pair.

--threshold <threshold>: Required The minimum similarity score to include a recording pair in the simlarity index.

--limit <limit>: Required The maximum number of similar artists to generate per artist (the limit is instructive. upto 2x artists may be returned than the limit).

--skip <skip>: Required the minimum difference threshold to mark track as skipped

--production: Required whether the dataset is being created as a production dataset. affects how the resulting dataset is stored in LB.

request_similar_recordings

Send the cluster a request to generate similar recordings index.

./develop.sh manage spark request_similar_recordings [OPTIONS]

Options

--days <days>: The number of days of listens to use. required if using listens data

--use-mlhd: Use MLHD+ data or ListenBrainz listens data

--session <session>: Required The maximum duration in seconds between two listens in a listening session.

--max-contribution <max_contribution>: Required The maximum contribution a user’s listens can make to the similarity score of a recording pair.

--threshold <threshold>: Required The minimum similarity score to include a recording pair in the simlarity index.

--limit <limit>: Required The maximum number of similar recordings to generate per recording (the limit is instructive. upto 2x recordings may be returned than the limit).

--skip-threshold <skip_threshold>: Required the minimum difference threshold to mark track as skipped

--only-stage2: whether to reuse existing outputs of intermediate chunks

--production: Required whether the dataset is being created as a production dataset. affects how the resulting dataset is stored in LB.

request_similar_users

Send the cluster a request to generate similar users.

./develop.sh manage spark request_similar_users [OPTIONS]

Options

--max-num-users <max_num_users>: The maxiumum number of similar users to return for any given user.

request_sitewide_stats

Send request to calculate sitewide stats to the spark cluster

./develop.sh manage spark request_sitewide_stats [OPTIONS]

Options

--type <type_>

Required Type of statistics to calculate

Options:: entity | listening_activity | era_activity | artist_evolution_activity

--range <range_>

Required Time range of statistics to calculate

Options:: this_week | this_month | this_year | week | month | quarter | year | half_yearly | all_time

--entity <entity>

Entity for which statistics should be calculated

Options:: artists | releases | recordings | release_groups

request_tags

Generate the tags dataset with percent rank

./develop.sh manage spark request_tags [OPTIONS]

request_troi_playlists

Bulk generate troi playlists for all users

./develop.sh manage spark request_troi_playlists [OPTIONS]

Options

--slug <slug>

Required

Options:: weekly-jams | weekly-exploration

--create-all: whether to create the periodic playlists for all users or only for users according to timezone.

request_user_stats

Send a user stats request to the spark cluster

./develop.sh manage spark request_user_stats [OPTIONS]

Options

--type <type_>

Required Type of statistics to calculate

Options:: entity | listening_activity | daily_activity | artist_evolution_activity | era_activity | genre_activity

--range <range_>

Required Time range of statistics to calculate

Options:: this_week | this_month | this_year | week | month | quarter | year | half_yearly | all_time

--entity <entity>

Entity for which statistics should be calculated

Options:: artists | releases | recordings | release_groups

--database <database>: Name of the couchdb database to store data in

request_year_in_music

Send the cluster a request to generate all year in music statistics.

./develop.sh manage spark request_year_in_music [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

--import-pg-tables, --no-import-pg-tables: whether to import the pg tables before generating the stats.

--export-to-spotify, --no-export-to-spotify: Whether to export the generated playlists to Spotify.

request_yim_artist_evolution

Calculate artist evolution activity for Year in Music

./develop.sh manage spark request_yim_artist_evolution [OPTIONS]

Options

--year <year>: Year for which to calculate artist evolution

request_yim_day_of_week

Send request to calculate most listened day of week to the spark cluster

./develop.sh manage spark request_yim_day_of_week [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_genre_activity

Calculate genre activity for Year in Music

./develop.sh manage spark request_yim_genre_activity [OPTIONS]

Options

--year <year>: Year for which to calculate genre activity

request_yim_listen_count

Send request to calculate yearly listen count stat to the spark cluster

./develop.sh manage spark request_yim_listen_count [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_listening_time

Send request to calculate yearly total listening time stat for each user to the spark cluster

./develop.sh manage spark request_yim_listening_time [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_listens_per_day

Send request to calculate listens per day stat to the spark cluster

./develop.sh manage spark request_yim_listens_per_day [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_most_listened_year

Send request to calculate most listened year stat to the spark cluster

./develop.sh manage spark request_yim_most_listened_year [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_new_artists_discovered

Send request to calculate count of new artists user listened to this year.

./develop.sh manage spark request_yim_new_artists_discovered 
    [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_new_release_stats

Send request to calculate new release stats to the spark cluster

./develop.sh manage spark request_yim_new_release_stats [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_similar_users

Send the cluster a request to generate similar users for Year in Music.

./develop.sh manage spark request_yim_similar_users [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_top_discoveries

Send the cluster a request to generate tracks of the year data and then once the data has been imported generate YIM playlists.

./develop.sh manage spark request_yim_top_discoveries [OPTIONS]

Options

--year <year>: Year for which to generate the playlists

--export-to-spotify, --no-export-to-spotify: Whether to export the generated playlists to Spotify.

request_yim_top_genres

Send request to calculate top genres each user listened to this year.

./develop.sh manage spark request_yim_top_genres [OPTIONS]

Options

--year <year>: Year for which to calculate the stat

request_yim_top_missed_recordings

Send the cluster a request to generate tracks of the year data and then once the data has been imported generate YIM playlists.

./develop.sh manage spark request_yim_top_missed_recordings 
    [OPTIONS]

Options

--year <year>: Year for which to generate the playlists

--export-to-spotify, --no-export-to-spotify: Whether to export the generated playlists to Spotify.

request_yim_top_stats

Send request to calculate top stats to the spark cluster

./develop.sh manage spark request_yim_top_stats [OPTIONS]

Options

--year <year>: Year for which to calculate the stat