Scripts¶
We have a bunch of python scripts to execute common tasks.
Note
During development, you can use ./develop.sh manage ...
to execute
the commands. In production, the command should be run inside the appropriate
container using python manage.py ...
.
ListenBrainz¶
These commands are helpful in running a ListenBrainz development instance and some other miscellaneous tasks.
./develop.sh manage¶
./develop.sh manage [OPTIONS] COMMAND [ARGS]...
add_missing_to_listen_users_metadata¶
./develop.sh manage add_missing_to_listen_users_metadata [OPTIONS]
clear-expired-do-not-recommends¶
Delete expired do not recommend entries from database
./develop.sh manage clear-expired-do-not-recommends [OPTIONS]
deduplicate-msb-listens¶
Migrate the listens table to new schema.
./develop.sh manage deduplicate-msb-listens [OPTIONS]
delete_listens¶
Complete all pending listen deletes and also run update script for updating listen metadata since last cron run
./develop.sh manage delete_listens [OPTIONS]
delete_pending_listens¶
Complete all pending listen deletes since last cron run
./develop.sh manage delete_pending_listens [OPTIONS]
fixup-recording-msid-tables¶
Migrate the listens table to new schema.
./develop.sh manage fixup-recording-msid-tables [OPTIONS]
init_db¶
Initializes database.
- This process involves several steps:
Table structure is created.
Primary keys and foreign keys are created.
Indexes are created.
./develop.sh manage init_db [OPTIONS]
Options
- -f, --force¶
Drop existing database and user.
- --create-db¶
Create the database and user.
init_ts_db¶
Initializes database.
- This process involves several steps:
Table structure is created.
Indexes are created.
Views are created
./develop.sh manage init_ts_db [OPTIONS]
Options
- -f, --force¶
Drop existing database and user.
- --create-db¶
Create the database and user.
listen-add-userid¶
Fill in the listen.user_id field based on user_name.
./develop.sh manage listen-add-userid [OPTIONS]
listen-migrate¶
Migrate the listens table to new schema.
./develop.sh manage listen-migrate [OPTIONS]
msb-transfer-db¶
Transfer MsB tables from MsB DB to TS DB
./develop.sh manage msb-transfer-db [OPTIONS]
notify_yim_users¶
./develop.sh manage notify_yim_users [OPTIONS]
Options
- --year <year>¶
Year for which to send the emails
recalculate_all_user_data¶
Recalculate all user timestamps and listen counts.
Note
ONLY USE THIS WHEN YOU KNOW WHAT YOU ARE DOING!
./develop.sh manage recalculate_all_user_data [OPTIONS]
refresh-top-manual-mappings¶
Refresh top manual msid-mbid mappings view
./develop.sh manage refresh-top-manual-mappings [OPTIONS]
run-daily-jams¶
Generate daily playlists for users soon after the new day begins in their timezone. This is an internal LB method and not a core function of troi.
./develop.sh manage run-daily-jams [OPTIONS]
Options
- --create-all¶
Create the daily jams for all users. if false (default), only for users according to timezone.
run-metadata-cache-seeder¶
Query external services’ new releases api for new releases and submit those to our cache as seeds
./develop.sh manage run-metadata-cache-seeder [OPTIONS]
run_websockets¶
./develop.sh manage run_websockets [OPTIONS]
Options
- -h, --host <host>¶
- Default
0.0.0.0
- -p, --port <port>¶
- Default
7082
- -d, --debug¶
Turns debugging mode on or off. If specified, overrides ‘DEBUG’ value in the config file.
set_rate_limits¶
./develop.sh manage set_rate_limits [OPTIONS] PER_TOKEN_LIMIT PER_IP_LIMIT
WINDOW_SIZE
Arguments
- PER_TOKEN_LIMIT¶
Required argument
- PER_IP_LIMIT¶
Required argument
- WINDOW_SIZE¶
Required argument
spotify-add-userid¶
Fill in the spotify user id using the connected user’s oauth token.
./develop.sh manage spotify-add-userid [OPTIONS]
submit-release¶
Submit a release from MusicBrainz to the local ListenBrainz instance
Specify -u to use the token of this user when submitting, or -t to specify a specific token.
./develop.sh manage submit-release [OPTIONS] RELEASEMBID
Options
- -u, --user <user>¶
- -t, --token <token>¶
Arguments
- RELEASEMBID¶
Required argument
update-msid-tables¶
Scan tables using msids to find matching mbids from mapping tables and update them.
./develop.sh manage update-msid-tables [OPTIONS]
update_user_emails¶
./develop.sh manage update_user_emails [OPTIONS]
update_user_listen_data¶
Scans listen table and update listen metadata for all users
./develop.sh manage update_user_listen_data [OPTIONS]
Dump Manager¶
These commands are used to export and import dumps.
./develop.sh manage dump¶
./develop.sh manage dump [OPTIONS] COMMAND [ARGS]...
check_dump_ages¶
Check to make sure that data dumps are sufficiently fresh. Send mail if they are not.
./develop.sh manage dump check_dump_ages [OPTIONS]
create_feedback¶
Create a spark formatted dump of user/recommendation feedback data.
./develop.sh manage dump create_feedback [OPTIONS]
Options
- -l, --location <location>¶
path to the directory where the dump should be made
- -t, --threads <threads>¶
the number of threads to be used while compression
create_full¶
Create a ListenBrainz data dump which includes a private dump, a statistics dump and a dump of the actual listens from the listenstore.
- Args:
location (str): path to the directory where the dump should be made threads (int): the number of threads to be used while compression dump_id (int): the ID of the ListenBrainz data dump do_listen_dump: If True, make a listens dump do_spark_dump: If True, make a spark listens dump do_db_dump: If True, make a public/private postgres dump do_timescale_dump: If True, make a public/private timescale dump do_stats_dump: If True, make a couchdb stats dump
./develop.sh manage dump create_full [OPTIONS]
Options
- -l, --location <location>¶
path to the directory where the dump should be made
- -t, --threads <threads>¶
the number of threads to be used while compression
- --dump-id <dump_id>¶
the ID of the ListenBrainz data dump
- --listen, --no-listen¶
- --spark, --no-spark¶
- --db, --no-db¶
- --timescale, --no-timescale¶
- --stats, --no-stats¶
create_incremental¶
./develop.sh manage dump create_incremental [OPTIONS]
Options
- -l, --location <location>¶
- -t, --threads <threads>¶
- --dump-id <dump_id>¶
create_mbcanonical¶
- Create a dump of the canonical mapping tables. This includes the following items:
metadata for canonical recordings
canonical recording redirect
canonical release redirect
These tables are created by the mapping canonical-data management command. If canonical-data is called with –use-lb-conn then the canonical metadata and recording redirect tables will
be in the listenbrainz timescale database connection
If called with –use-mb-conn then all tables will be in the musicbrainz database connection. The canonical release redirect table will always be in the musicbrainz database connection.
./develop.sh manage dump create_mbcanonical [OPTIONS]
Options
- -l, --location <location>¶
path to the directory where the dump should be made
- --use-lb-conn, --use-mb-conn¶
Dump the metadata table from the listenbrainz database
create_parquet¶
./develop.sh manage dump create_parquet [OPTIONS]
delete_old_dumps¶
./develop.sh manage dump delete_old_dumps [OPTIONS] LOCATION
Arguments
- LOCATION¶
Required argument
import_dump¶
Import a ListenBrainz dump into the database.
- Args:
private_archive (str): the path to the ListenBrainz private dump to be imported private_timescale_archive (str): the path to the ListenBrainz private timescale dump to be imported public_archive (str): the path to the ListenBrainz public dump to be imported public_timescale_archive (str): the path to the ListenBrainz public timescale dump to be imported listen_archive (str): the path to the ListenBrainz listen dump archive to be imported threads (int): the number of threads to use during decompression, defaults to 1
Note
This method tries to import the private db dump first, followed by the public db dump. However, in absence of a private dump, it imports sanitized versions of the user table in the public dump in order to satisfy foreign key constraints. Then it imports the listen dump.
./develop.sh manage dump import_dump [OPTIONS]
Options
- -pr, --private-archive <private_archive>¶
the path to the ListenBrainz private dump to be imported
- --private-timescale-archive <private_timescale_archive>¶
the path to the ListenBrainz private timescale dump to be imported
- -pu, --public-archive <public_archive>¶
the path to the ListenBrainz public dump to be imported
- --public-timescale-archive <public_timescale_archive>¶
the path to the ListenBrainz public timescale dump to be imported
- -l, --listen-archive <listen_archive>¶
the path to the ListenBrainz listen dump archive to be imported
- -t, --threads <threads>¶
the number of threads to use during decompression, defaults to 1
ListenBrainz Spark¶
These commands are used to interact with the Spark Cluster.
python spark_manage.py¶
python spark_manage.py [OPTIONS] COMMAND [ARGS]...
request_consumer¶
Invoke script responsible for the request consumer
python spark_manage.py request_consumer [OPTIONS]
./develop.sh manage spark¶
./develop.sh manage spark [OPTIONS] COMMAND [ARGS]...
cron_request_all_stats¶
./develop.sh manage spark cron_request_all_stats [OPTIONS]
cron_request_recommendations¶
./develop.sh manage spark cron_request_recommendations [OPTIONS]
cron_request_similar_users¶
./develop.sh manage spark cron_request_similar_users [OPTIONS]
cron_request_similarity_datasets¶
./develop.sh manage spark cron_request_similarity_datasets
[OPTIONS]
request_dataframes¶
Send the cluster a request to create dataframes.
./develop.sh manage spark request_dataframes [OPTIONS]
Options
- --days <days>¶
Request model to be trained on data of given number of days
- --job-type <job_type>¶
The type of dataframes to request. ‘recommendation_recording’ or ‘similar_users’ are allowed.
- --listens-threshold <listens_threshold>¶
The minimum number of listens a user should have to be included in the dataframes.
request_entity_stats¶
Send an entity stats request to the spark cluster
./develop.sh manage spark request_entity_stats [OPTIONS]
Options
- --type <type_>¶
Required Type of statistics to calculate
- Options
listeners
- --range <range_>¶
Required Time range of statistics to calculate
- Options
week | month | quarter | half_yearly | year | all_time | this_week | this_month | this_year
- --entity <entity>¶
Entity for which statistics should be calculated
- Options
artists | release_groups
- --database <database>¶
Name of the couchdb database to store data in
request_fresh_releases¶
Send the cluster a request to generate release radar data.
./develop.sh manage spark request_fresh_releases [OPTIONS]
Options
- --days <days>¶
Number of days of listens to consider for artist listening data
- --database <database>¶
Name of the couchdb database to store data in
request_import_artist_relation¶
Send the spark cluster a request to import artist relation.
./develop.sh manage spark request_import_artist_relation [OPTIONS]
request_import_full¶
Send the cluster a request to import a new full data dump
./develop.sh manage spark request_import_full [OPTIONS]
Options
- --id <id_>¶
Optional. ID of the full dump to import, defaults to latest dump available on FTP server
request_import_incremental¶
Send the cluster a request to import a new incremental data dump
./develop.sh manage spark request_import_incremental [OPTIONS]
Options
- --id <id_>¶
Optional. ID of the incremental dump to import, defaults to latest dump available on FTP server
request_import_mlhd_dump¶
Send the spark cluster a request to import musicbrainz release dump.
./develop.sh manage spark request_import_mlhd_dump [OPTIONS]
request_import_musicbrainz_release_dump¶
Send the spark cluster a request to import musicbrainz release dump.
./develop.sh manage spark request_import_musicbrainz_release_dump
[OPTIONS]
request_import_pg_tables¶
Send the cluster a request to import metadata table from MB db postgres
./develop.sh manage spark request_import_pg_tables [OPTIONS]
request_missing_mb_data¶
Send the cluster a request to generate missing MB data.
./develop.sh manage spark request_missing_mb_data [OPTIONS]
Options
- --days <days>¶
Request missing musicbrainz data based on listen data of given number of days
request_mlhd_popularity¶
Request mlhd popularity data.
./develop.sh manage spark request_mlhd_popularity [OPTIONS]
request_model¶
Send the cluster a request to train the model.
For more details refer to https://spark.apache.org/docs/2.1.0/mllib-collaborative-filtering.html
./develop.sh manage spark request_model [OPTIONS]
Options
- --rank <rank>¶
Number of hidden features
- --itr <itr>¶
Number of iterations to run.
- --lmbda <lmbda>¶
Controls over fitting.
- --alpha <alpha>¶
Baseline level of confidence weighting applied.
- --use-transformed-listencounts¶
Whether to apply a transformation function on the listencounts or use original listen playcounts
request_recommendations¶
Send the cluster a request to generate recommendations.
./develop.sh manage spark request_recommendations [OPTIONS]
Options
- --raw <raw>¶
Generate given number of raw recommendations
- --user-name <users>¶
Generate recommendations for given users. Generate recommendations for all users by default.
request_recording_discovery¶
Send the cluster a request to generate recording discovery data.
./develop.sh manage spark request_recording_discovery [OPTIONS]
request_similar_artists¶
Send the cluster a request to generate similar artists index.
./develop.sh manage spark request_similar_artists [OPTIONS]
Options
- --days <days>¶
Required The number of days of listens to use.
- --session <session>¶
Required The maximum duration in seconds between two listens in a listening session.
- --contribution <contribution>¶
Required The maximum contribution a user’s listens can make to the similarity score of a artist pair.
- --threshold <threshold>¶
Required The minimum similarity score to include a recording pair in the simlarity index.
- --limit <limit>¶
Required The maximum number of similar artists to generate per artist (the limit is instructive. upto 2x artists may be returned than the limit).
- --skip <skip>¶
Required the minimum difference threshold to mark track as skipped
- --production¶
Required whether the dataset is being created as a production dataset. affects how the resulting dataset is stored in LB.
request_similar_recordings¶
Send the cluster a request to generate similar recordings index.
./develop.sh manage spark request_similar_recordings [OPTIONS]
Options
- --days <days>¶
Required The number of days of listens to use.
- --session <session>¶
Required The maximum duration in seconds between two listens in a listening session.
- --contribution <contribution>¶
Required The maximum contribution a user’s listens can make to the similarity score of a recording pair.
- --threshold <threshold>¶
Required The minimum similarity score to include a recording pair in the simlarity index.
- --limit <limit>¶
Required The maximum number of similar recordings to generate per recording (the limit is instructive. upto 2x recordings may be returned than the limit).
- --skip <skip>¶
Required the minimum difference threshold to mark track as skipped
- --production¶
Required whether the dataset is being created as a production dataset. affects how the resulting dataset is stored in LB.
request_similar_recordings_mlhd¶
Send the cluster a request to generate similar recordings index.
./develop.sh manage spark request_similar_recordings_mlhd [OPTIONS]
Options
- --session <session>¶
Required The maximum duration in seconds between two listens in a listening session.
- --contribution <contribution>¶
Required The maximum contribution a user’s listens can make to the similarity score of a recording pair.
- --threshold <threshold>¶
Required The minimum similarity score to include a recording pair in the simlarity index.
- --limit <limit>¶
Required The maximum number of similar recordings to generate per recording (the limit is instructive. upto 2x recordings may be returned than the limit).
- --skip <skip>¶
Required the minimum difference threshold to mark track as skipped
request_similar_users¶
Send the cluster a request to generate similar users.
./develop.sh manage spark request_similar_users [OPTIONS]
Options
- --max-num-users <max_num_users>¶
The maxiumum number of similar users to return for any given user.
request_sitewide_stats¶
Send request to calculate sitewide stats to the spark cluster
./develop.sh manage spark request_sitewide_stats [OPTIONS]
Options
- --type <type_>¶
Required Type of statistics to calculate
- Options
entity | listening_activity
- --range <range_>¶
Required Time range of statistics to calculate
- Options
week | month | quarter | half_yearly | year | all_time | this_week | this_month | this_year
- --entity <entity>¶
Entity for which statistics should be calculated
- Options
artists | releases | recordings | release_groups
request_troi_playlists¶
Bulk generate troi playlists for all users
./develop.sh manage spark request_troi_playlists [OPTIONS]
Options
- --slug <slug>¶
Required
- Options
weekly-jams | weekly-exploration
- --create-all¶
whether to create the periodic playlists for all users or only for users according to timezone.
request_user_stats¶
Send a user stats request to the spark cluster
./develop.sh manage spark request_user_stats [OPTIONS]
Options
- --type <type_>¶
Required Type of statistics to calculate
- Options
entity | listening_activity | daily_activity | listeners
- --range <range_>¶
Required Time range of statistics to calculate
- Options
week | month | quarter | half_yearly | year | all_time | this_week | this_month | this_year
- --entity <entity>¶
Entity for which statistics should be calculated
- Options
artists | releases | recordings | release_groups
- --database <database>¶
Name of the couchdb database to store data in
request_year_in_music¶
Send the cluster a request to generate all year in music statistics.
./develop.sh manage spark request_year_in_music [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_artist_map¶
Send the cluster a request to generate artist map data and then once the data has been imported generate YIM artist map.
./develop.sh manage spark request_yim_artist_map [OPTIONS]
Options
- --year <year>¶
Year for which to generate the playlists
request_yim_day_of_week¶
Send request to calculate most listened day of week to the spark cluster
./develop.sh manage spark request_yim_day_of_week [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_listen_count¶
Send request to calculate yearly listen count stat to the spark cluster
./develop.sh manage spark request_yim_listen_count [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_listening_time¶
Send request to calculate yearly total listening time stat for each user to the spark cluster
./develop.sh manage spark request_yim_listening_time [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_listens_per_day¶
Send request to calculate listens per day stat to the spark cluster
./develop.sh manage spark request_yim_listens_per_day [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_most_listened_year¶
Send request to calculate most listened year stat to the spark cluster
./develop.sh manage spark request_yim_most_listened_year [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_new_artists_discovered¶
Send request to calculate count of new artists user listened to this year.
./develop.sh manage spark request_yim_new_artists_discovered
[OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_new_release_stats¶
Send request to calculate new release stats to the spark cluster
./develop.sh manage spark request_yim_new_release_stats [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_playlists¶
Send the cluster a request to generate tracks of the year data and then once the data has been imported generate YIM playlists.
./develop.sh manage spark request_yim_playlists [OPTIONS]
Options
- --year <year>¶
Year for which to generate the playlists
request_yim_similar_users¶
Send the cluster a request to generate similar users for Year in Music.
./develop.sh manage spark request_yim_similar_users [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat
request_yim_top_stats¶
Send request to calculate top stats to the spark cluster
./develop.sh manage spark request_yim_top_stats [OPTIONS]
Options
- --year <year>¶
Year for which to calculate the stat