pytube是一個輕量級、Pythonic、無依賴的庫(和命令行實用程序),用於下載 YouTube 視頻。

看哪,簡單性與靈活性的完美平衡:

>>> from pytube import YouTube
>>> YouTube('https://youtu.be/9bZkp7q19f0').streams.first().download()
>>> yt = YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
>>> yt.streams
... .filter(progressive=True, file_extension='mp4')
... .order_by('resolution')
... .desc()
... .first()
... .download()

特徵
支持漸進式和 DASH 流
輕鬆註冊on_download_progress和on_download_complete回調
包括命令行接口
字幕軌道支持
將字幕軌道輸出為 .srt 格式(SubRip 字幕)
能夠捕獲縮略圖 URL。
廣泛記錄的源代碼
沒有第三方依賴

pytube项目
最近我家姑娘的幼儿园外教需要一整套YouTube的教学儿歌《Singing Walrus Music》,在家长群里发出求助后,作为程序员的老爸必须把这个事情安排的明明白白的。

Github地址
https://github.com/nficano/pytube

文档地址
https://python-pytube.readthedocs.io

安装方式

pip install pytube

快速上手

from pytube import YouTube
YouTube('http://youtube.com/watch?v=9bZkp7q19f0').streams.first().download()

pytube的 first() 方法,按照作者的解释,会选取最高分辨率的视频进行下载,但亲测后发现效果并不理想。
YouTube的是采用DASH Streams的技术架构,其中的DASH技术会将视频、音频进行独立拆分,比如视频有480p video,720p video,音频有44100采样 audio,22050采样audio。通过以下代码即可输出DASH的Representation描述信息:

yt = YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
yt.streams.all()
 [<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">,
 <Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">,
 <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">,
 <Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
 <Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
 <Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">,
 <Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">,
 <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">,
 <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">,
 <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">,
 <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">,
 <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">,
 <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">,
 <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">,
 <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">,
 <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">,
 <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">,
 <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">,
 <Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">,
 <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">,
 <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">,
 <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]

其中 itag=”22″ 的视频为720p并带有音频( acodec=”mp4a.40.2″ )的视频文件;而 itag=”136″ 同样的720p的,却是无声版视频文件。
回到之前的pytube的 first() 方法,该方法会优先混合音频的视频源,再选择无声版视频源。这就导致一种极端情况发生, first() 会简单粗暴的选择了低分辨率的混合版视频源,忽略了高清版视频源。
我自己对视频筛选逻辑进行重新改写,后面会说明。
视频筛选
pytube提供了多种视频筛选策略
1、传统混合音频的视频源
设置参数为 progressive=True

yt.streams.filter(progressive=True).all()
[<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">,
  <Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">,
  <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">,
  <Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
  <Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">]

2、DASH流的视频源
设置参数为 adaptive=True

yt.streams.filter(adaptive=True).all()
 [<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">,
  <Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">,
  <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">,
  <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">,
  <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">,
  <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">,
  <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">,
  <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">,
  <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">,
  <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">,
  <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">,
  <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">,
  <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">,
  <Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">,
  <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">,
  <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">,
  <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]

3、其它过滤条件

only_audio=True :只下载音频
only_video :只下载视频
subtype='mp4' :下载扩展名为“mp4”的文件,包括音频和视频
res="720p" :下载清晰度为720p的视频
abr="64kbps" :下载码率为64kbps的视频
video_codec="vp9" :下载压缩格式为vp9的视频
audio_codec="vorbis" :下载压缩格式为vorbis的音频

通过itag下载视频
YouTube对每个DASH流的视频源的类型给了一个独立的id,称为itag
可通过 get_by_itag 方法下载对应视频

yt.streams.get_by_itag(22)

itag Code Container Content Resolution Bitrate Range VR / 3D
5 flv audio/video 240p – – –
6 flv audio/video 270p – – –
17 3gp audio/video 144p – – –
18 mp4 audio/video 360p – – –
22 mp4 audio/video 720p – – –
34 flv audio/video 360p – – –
35 flv audio/video 480p – – –
36 3gp audio/video 180p – – –
37 mp4 audio/video 1080p – – –
38 mp4 audio/video 3072p – – –
43 webm audio/video 360p – – –
44 webm audio/video 480p – – –
45 webm audio/video 720p – – –
46 webm audio/video 1080p – – –
82 mp4 audio/video 360p – – 3D
83 mp4 audio/video 480p – – 3D
84 mp4 audio/video 720p – – 3D
85 mp4 audio/video 1080p – – 3D
92 hls audio/video 240p – – 3D
93 hls audio/video 360p – – 3D
94 hls audio/video 480p – – 3D
95 hls audio/video 720p – – 3D
96 hls audio/video 1080p – – –
100 webm audio/video 360p – – 3D
101 webm audio/video 480p – – 3D
102 webm audio/video 720p – – 3D
132 hls audio/video 240p – –
133 mp4 video 240p – –
134 mp4 video 360p – –
135 mp4 video 480p – –
136 mp4 video 720p – –
137 mp4 video 1080p – –
138 mp4 video 2160p60 – –
139 m4a audio – 48k –
140 m4a audio – 128k –
141 m4a audio – 256k –
151 hls audio/video 72p – –
160 mp4 video 144p – –
167 webm video 360p – –
168 webm video 480p – –
169 webm video 1080p – –
171 webm audio – 128k –
218 webm video 480p – –
219 webm video 144p – –
242 webm video 240p – –
243 webm video 360p – –
244 webm video 480p – –
245 webm video 480p – –
246 webm video 480p – –
247 webm video 720p – –
248 webm video 1080p – –
249 webm audio – 50k –
250 webm audio – 70k –
251 webm audio – 160k –
264 mp4 video 1440p – –
266 mp4 video 2160p60 – –
271 webm video 1440p – –
272 webm video 4320p – –
278 webm video 144p – –
298 mp4 video 720p60 – –
299 mp4 video 1080p60 – –
302 webm video 720p60 – –
303 webm video 1080p60 – –
308 webm video 1440p60 – –
313 webm video 2160p – –
315 webm video 2160p60 – –
330 webm video 144p60 – hdr
331 webm video 240p60 – hdr
332 webm video 360p60 – hdr
333 webm video 480p60 – hdr
334 webm video 720p60 – hdr
335 webm video 1080p60 – hdr
336 webm video 1440p60 – hdr
337 webm video 2160p60 – hdr
394 mp4 video 144p – –
395 mp4 video 240p – –
396 mp4 video 360p – –
397 mp4 video 480p – –
398 mp4 video 720p – –
399 mp4 video 1080p – –
400 mp4 video 1440p – –
401 mp4 video 2160p – –
402 mp4 video 2880p – –
————————————————
关于网络
因为需要避免西方资本主义思想毒害,网络经常请求不稳定
常见的错误会有以下两种:
HTTPError
URLError
使用Pycharm的同学还会遇到 ConnectionResetError
前两种错误需要引入 from urllib.error import HTTPError, URLError
然后通过where循环,try…except… 来重复请求

yt = None
while True:
    try:
        yt = YouTube(url)
        break
    except HTTPError:
        self.logger.error("请求出错一次:HTTPError")
        continue
    except URLError:
        self.logger.error("请求出错一次:URLError")
        continue
streams = yt.streams.filter(subtype='mp4').all()

下载视频
当确认了符合条件的视频后,可通过 download 的方式直接下载

from pytube import YouTube
yt=YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
mp4=yt.streams.first()
mp4.download(output_path, filename, filename_prefix)
其中 download 会接受3个参数:
output_path :视频输出路径;
filename :视频输出名称,默认为视频的标题,该名称不需要扩展名;
filename_prefix :视频名称前缀,这里主要是区分音频和视频,因为音频和视频下载后名称相同,格式相同,前者会被后者覆盖掉。可以增加前缀来进行区分,比如音频为“audio_FilmTitle.mp4”、视频为“video_FilmTitle.mp4”

发表回复

后才能评论