How to crawl lyrics from spotify with syrics and spotipy

由于几乎所有的音乐网站在组织形式上都是以预先设定好的track id记录歌曲,因此与之前的思路相似,整个脚本仍然分为两个阶段:获取歌曲对应的trackid,再根据trackid获取歌词,此处,通过搜索获取trackid使用spotipy实现,而根据trackid获取歌词使用syrics实现。

利用spotipy通过歌曲名+歌手名搜索获取trackid

利用syrics通过trackid获取歌词

获取spotify中的user cookie(sp_dc)

由于我现在的chrome的版本为119.0.6045.160,已经无法通过直接在setting的cookie中查看具体cookie的数值。只能通过打开 https://open.spotify.com/ ,之后按下F12进入开发者模式,在application中的左侧菜单栏中storage中选择cookie,在其中选择open.spotify.com,之后在列出的一系列cookie中找到sp_dc,复制出来即可。

使用cookie以及trackid进行歌词获取

此处直接参考 https://github.com/akashrchandran/syrics readme中最后一段 Use as a module 获取歌词,即:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from syrics.api import Spotify
sp = Spotify(sp_dc)
sp.get_lyrics("1OOtq8tRnDM8kG2gqUPjAj")
{'lyrics': {'syncType': 'LINE_SYNCED',
'lines': [{'startTimeMs': '38190', 'words': "They told him don't you ever", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '39740', 'words': 'come around here', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '41250', 'words': "Don't wanna see your face", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '42600', 'words': 'you better disappear', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '44710', 'words': "The fire's in their eyes", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '46250', 'words': 'and the words are really clear', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '47930', 'words': 'So beat it just beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '52100', 'words': 'You better run', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '52890', 'words': 'you better do what you can', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '55110', 'words': "Don't wanna see no blood", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '56380', 'words': "don't be a macho man oh", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '58540', 'words': 'You wanna be tough', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '60020', 'words': 'better do what you can', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '61740', 'words': 'So beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '63500', 'words': 'but you wanna be bad', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '65180', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '67430', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '68890', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '72340', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '73830', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '75820', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '77520', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '79010', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '80760', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '82460', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '84270', 'words': 'Just beat it beat it oh', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '86630', 'words': "They're out to get you", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '87710', 'words': 'better leave while you can', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '89630', 'words': "Don't wanna be a boy", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '90910', 'words': 'you wanna be a man', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '93110', 'words': 'You wanna stay alive', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '94580', 'words': 'better do what you can', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '96310', 'words': 'So beat it da just beat it du', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '100470', 'words': 'You have to show them', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '101500', 'words': "that you're really not scared", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '103510', 'words': "You're playing with your life", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '104710', 'words': "this ain't no truth or dare oh", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '106930', 'words': "They'll kick you", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '107520', 'words': 'then they beat you', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '108380', 'words': "Then they'll tell you it's fair", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '110090', 'words': 'So beat it da', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '111870', 'words': 'but you wanna be bad', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '113530', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '115740', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '117250', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '120740', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '122270', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '124140', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '125870', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '127400', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '129570', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '131060', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '134510', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '136160', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '137990', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '139720', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '141140', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '144230', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '151160', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '153740', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '160690', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '162050', 'words': '♪', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '197020', 'words': 'Beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '198790', 'words': 'Beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '200250', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '203580', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '205260', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '207100', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '208870', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '210260', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '212550', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '214020', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '216170', 'words': 'oh no', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '217480', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '218930', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '220900', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '222650', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '224140', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '226330', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '227860', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '229980', 'words': 'oh no', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '231300', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '232920', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '234750', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '236500', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '237950', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '240180', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '241660', 'words': 'No one wants to be defeated', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '245040', 'words': 'Showing how funky', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '246700', 'words': 'and strong is your fight', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '248590', 'words': "It doesn't matter", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '250310', 'words': "who's wrong or right", 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '251790', 'words': 'Just beat it beat it', 'syllables': [], 'endTimeMs': '0'}, {'startTimeMs': '254040', 'words': 'beat it beat it', 'syllables': [], 'endTimeMs': '0'}],
'provider': 'syncpower',
'providerLyricsId': '280721',
'providerDisplayName': 'プチリリ',
'syncLyricsUri': '', 'isDenseTypeface': False,
'alternatives': [], 'language': '',
'isRtlLanguage': False,
'fullscreenAction': 'FULLSCREEN_LYRICS',
'showUpsell': False},
'colors': {'background': -1694671, 'text': -16777216, 'highlightText': -1},
'hasVocalRemoval': False}

第二行的sp_dc即是之前获取的cookie,填入程序即可。上述为Machael JacksonBeat It的歌词。可以发现,其中含有时间轴,这正是我们需要的,分析以下返回的文本格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
'lyrics':{
'syncType': 'LINE_SYNCED',
'lines': [
{'startTimeMs': '38190',
'words': "They told him don't you ever",
'syllables': [],
'endTimeMs': '0'},
{'startTimeMs': '39740',
'words': 'come around here',
'syllables': [],
'endTimeMs': '0'},
....
],
'provider': 'syncpower',
'providerLyricsId': '280721',
'providerDisplayName': 'プチリリ',
'syncLyricsUri': '', 'isDenseTypeface': False,
'alternatives': [], 'language': '',
'isRtlLanguage': False,
'fullscreenAction': 'FULLSCREEN_LYRICS',
'showUpsell': False},
},
'colors': {'background': -1694671, 'text': -16777216, 'highlightText': -1},
'hasVocalRemoval': False}
}

可以观察到返回的json文件(已经是python的dict类型)由lyricscolorshasVocalRemoval三部分组成,而我们需要的带有时间戳的歌词就在lyricslines中,其是一个以每个时间戳为单位的list,每一个单位是一个dict,其中包含歌词的起止时间,歌词。这就是我们所需要的信息。

接下来下使用下列代码将其单独收集出来即可:

1
2
3
metadata = sp.get_lyrics("1OOtq8tRnDM8kG2gqUPjAj")
# print(type(a)) return is a dict type object
lyrics = metadata["lyrics"]["lines"]

How to crawl lyrics from spotify with syrics and spotipy
http://example.com/2023/11/19/How-to-crawl-lyrics-from-spotify-with-syrics-and-spotipy/
Author
iMusic
Posted on
November 19, 2023
Licensed under