以前写过一个音乐网站,我都是手动去下载音乐,并上传到网站,非常麻烦。
学习了HttpClinet和Jsoup 我决定完成一个简单的爬虫去收集音乐信息,并下载音乐;
先尝试做几个简单的小功能:
基本功能
1.根据歌曲名或歌手名爬取音乐
2.查看下载的音乐信息
3.将MP3信息下载到电脑中
4.数据库将音乐去重
1.根据歌曲名或歌手名爬取音乐
@GetMapping("/{search}")public Result downloadMusic(@PathVariable("search")String search){List<Music> musicList = musicSearchTask.searchByMusicNameOrSingerName("https://www.xzmp3.com", search);return Result.ok(musicList.size());}
/*** * 根据mainUrl search 爬取 歌曲名 歌手 下载地址 并保存到数据库* @param mainUrl* @param search*/public List<Music> searchByMusicNameOrSingerName(String mainUrl,String search){String searchString = mainUrl+"/xiazai/"+search+".htm";log.info("下载地址:"+searchString);String html = httpUtils.doGetHtml(searchString);Document doc = Jsoup.parse(html);Elements elements = doc.select(".list.return_list>div.return_item");List<Music> musicList = new ArrayList<>();for (Element element : elements){Music m = new Music();String musicName = element.select(".single_line.item.name").text();String singerName = element.select(".single_line.item.singer").text();String DownloadUrl = element.select(".single_line.item.name").attr("href");DownloadUrl = mainUrl+DownloadUrl;m.setMusicName(musicName);m.setSingerName(singerName);m.setDownloadUrl(DownloadUrl);musicList.add(m);System.out.println(m);}boolean b = musicServerimpl.saveBatch(musicList);if(b){log.info("导入数据库成功");}else {log.error("导入失败");}return musicList;}
2.查看下载的音乐信息
@PostMapping("findAll")public Result findAll(){List<Music> list = musicServerimpl.list();return Result.ok(list);}
3.将MP3信息下载到电脑中
public String doGetMP3(String url){if(!url.endsWith("mp3")){log.info("下载失败,不是.mp3");return "";}//httpClient对象CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(this.cm).build();//get请求HttpGet httpGet = new HttpGet(url);httpGet.setConfig(this.getConfig());CloseableHttpResponse response = null;try {response = httpClient.execute(httpGet);log.info("响应状态:"+response.getStatusLine().getStatusCode());log.info("响应信息:"+response.getEntity());if(response.getStatusLine().getStatusCode() == 200){//判断响应体if(response.getEntity() != null){String mp3Name = UUID.randomUUID().toString()+".mp3";//下载MP3OutputStream outputStream = new FileOutputStream("C:\\Users\\彭禹豪\\Desktop\\images\\"+mp3Name);response.getEntity().writeTo(outputStream);log.info("下载成功:"+mp3Name);return mp3Name;}log.info("下载失败,没有MP3信息");return "";}} catch (IOException e) {throw new RuntimeException(e);}finally {if(response!=null){try {response.close();} catch (IOException e) {throw new RuntimeException(e);}}}log.info("下载失败,下载有问题或无下载权限");return "";}
4.数据库将音乐去重
<delete id="musicDeduplication">delete from music where music_id not in (select * from (select max(music_id) from music group by music_name,singer_name)temp)</delete>
由于种种原因,我没有选择通过一个种子不断解析出url下载所有音乐信息,而是通过音乐名和歌曲名分析一个url 并解析出里面的音乐并保存到数据库。而且MP3的信息还下载地址中没有解析出来。
后续我将学习webMagic框架,通过模块的方式,进行功能的解耦,不断的分析出url,下载音乐信息。并通过异步方式分析出MP3地址下载音乐到本地。