Model Zoo Summary¶
In this page, we list all algorithms we support. You can click the link to jump to the corresponding model pages.
And we also list all checkpoints for different tasks we provide. You can sort or search checkpoints in the table and click the corresponding link to model pages for more details.
All supported algorithms¶
Number of papers: 35
Algorithm: 35
Number of checkpoints: 189
[Algorithm] Actor-Centric Relation Network (2 ckpts)
[Algorithm] Long-Term Feature Banks for Detailed Video Understanding (2 ckpts)
[Algorithm] SlowFast Networks for Video Recognition (7 ckpts)
[Algorithm] SlowFast Networks for Video Recognition (15 ckpts)
[Algorithm] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (2 ckpts)
[Algorithm] Non-local Neural Networks (4 ckpts)
[Algorithm] Learning Spatiotemporal Features with 3D Convolutional Networks (1 ckpts)
[Algorithm] Video Classification with Channel-Separated Convolutional Networks (3 ckpts)
[Algorithm] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (6 ckpts)
[Algorithm] MViTv2: Improved Multiscale Vision Transformers for Classification and Detection (9 ckpts)
[Algorithm] Omni-sourced Webly-supervised Learning for Video Recognition (1 ckpts)
[Algorithm] A Closer Look at Spatiotemporal Convolutions for Action Recognition (2 ckpts)
[Algorithm] SlowFast Networks for Video Recognition (5 ckpts)
[Algorithm] SlowFast Networks for Video Recognition (10 ckpts)
[Algorithm] Video Swin Transformer (6 ckpts)
[Algorithm] TAM: Temporal Adaptive Module for Video Recognition (3 ckpts)
[Algorithm] Is Space-Time Attention All You Need for Video Understanding (3 ckpts)
[Algorithm] Temporal Interlacing Network (3 ckpts)
[Algorithm] Temporal Pyramid Network for Action Recognition (3 ckpts)
[Algorithm] Temporal Relational Reasoning in Videos (2 ckpts)
[Algorithm] TSM: Temporal Shift Module for Efficient Video Understanding (12 ckpts)
[Algorithm] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (12 ckpts)
[Algorithm] UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning (3 ckpts)
[Algorithm] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer (23 ckpts)
[Algorithm] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (2 ckpts)
[Algorithm] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking (2 ckpts)
[Algorithm] X3D: Expanding Architectures for Efficient Video Recognition (2 ckpts)
[Algorithm] Audiovisual SlowFast Networks for Video Recognition (1 ckpts)
[Algorithm] BMN: Boundary-Matching Network for Temporal Action Proposal Generation (2 ckpts)
[Algorithm] BSN: Boundary Sensitive Network for Temporal Action Proposal Generation (1 ckpts)
[Algorithm] CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval (1 ckpts)
[Algorithm] Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition (8 ckpts)
[Algorithm] Revisiting Skeleton-based Action Recognition (7 ckpts)
[Algorithm] Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (16 ckpts)
[Algorithm] PYSKL: Towards Good Practices for Skeleton Action Recognition (8 ckpts)
Action Recognition¶
Kinetics-400¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top-5 (%) |
Readme |
---|---|---|---|---|---|
c2d_r50-in1k-pre-nopool_8xb32-8x8x1-100e_kinetics400-rgb |
24.30 |
33.00 |
73.44 |
91.0 |
|
c2d_r101-in1k-pre-nopool_8xb32-8x8x1-100e_kinetics400-rgb |
43.30 |
63.00 |
74.97 |
91.77 |
|
c2d_r50-in1k-pre_8xb32-8x8x1-100e_kinetics400-rgb |
24.30 |
19.00 |
73.89 |
91.21 |
|
c2d_r50-in1k-pre_8xb32-16x4x1-100e_kinetics400-rgb |
24.30 |
39.00 |
74.97 |
91.91 |
|
ircsn_ig65m-pretrained-r152_8xb12-32x2x1-58e_kinetics400-rgb |
29.70 |
97.63 |
82.87 |
95.9 |
|
ircsn_ig65m-pretrained-r152-bnfrozen_8xb12-32x2x1-58e_kinetics400-rgb |
29.70 |
97.63 |
82.84 |
95.92 |
|
ircsn_ig65m-pretrained-r50-bnfrozen_8xb12-32x2x1-58e_kinetics400-rgb |
13.13 |
55.90 |
79.44 |
94.26 |
|
ipcsn_r152_32x2x1-180e_kinetics400-rgb |
33.02 |
109.90 |
77.80 |
93.1 |
|
ircsn_r152_32x2x1-180e_kinetics400-rgb |
29.70 |
97.63 |
76.53 |
92.28 |
|
ipcsn_ig65m-pretrained-r152-bnfrozen_32x2x1-58e_kinetics400-rgb |
33.02 |
109.90 |
82.68 |
95.69 |
|
ipcsn_sports1m-pretrained-r152-bnfrozen_32x2x1-58e_kinetics400-rgb |
33.02 |
109.90 |
79.07 |
93.82 |
|
ircsn_sports1m-pretrained-r152-bnfrozen_32x2x1-58e_kinetics400-rgb |
33.02 |
109.90 |
78.57 |
93.44 |
|
i3d_imagenet-pretrained-r50-nl-dot-product_8xb8-32x2x1-100e_kinetics400-rgb |
35.40 |
59.30 |
74.80 |
92.07 |
|
i3d_imagenet-pretrained-r50-nl-embedded-gaussian_8xb8-32x2x1-100e_kinetics400-rgb |
35.40 |
59.30 |
74.73 |
91.8 |
|
i3d_imagenet-pretrained-r50-nl-gaussian_8xb8-32x2x1-100e_kinetics400-rgb |
31.70 |
56.50 |
73.97 |
91.33 |
|
i3d_imagenet-pretrained-r50_8xb8-32x2x1-100e_kinetics400-rgb |
28.00 |
43.50 |
73.47 |
91.27 |
|
i3d_imagenet-pretrained-r50_8xb8-dense-32x2x1-100e_kinetics400-rgb |
28.00 |
43.50 |
73.77 |
91.35 |
|
i3d_imagenet-pretrained-r50-heavy_8xb8-32x2x1-100e_kinetics400-rgb |
33.00 |
166.30 |
76.21 |
92.48 |
|
mvit-small-p244_32xb16-16x4x1-200e_kinetics400-rgb_infer |
81.10 |
94.7 |
|||
mvit-small-p244_32xb16-16x4x1-200e_kinetics400-rgb |
34.50 |
64.00 |
80.60 |
94.7 |
|
mvit-base-p244_32x3x1_kinetics400-rgb |
81.10 |
94.7 |
|||
mvit-large-p244_40x3x1_kinetics400-rgb |
81.10 |
94.7 |
|||
mvit-small-p244_k400-maskfeat-pre_8xb32-16x4x1-100e_kinetics400-rgb |
36.40 |
71.00 |
81.80 |
95.2 |
|
slowonly_r50_8xb16-8x8x1-256e_imagenet-kinetics400-rgb |
32.45 |
54.75 |
77.30 |
93.23 |
|
r2plus1d_r34_8xb8-8x8x1-180e_kinetics400-rgb |
63.80 |
53.10 |
69.76 |
88.41 |
|
r2plus1d_r34_8xb8-32x2x1-180e_kinetics400-rgb |
63.80 |
213.00 |
75.46 |
92.28 |
|
slowfast_r50_8xb8-4x16x1-256e_kinetics400-rgb |
34.50 |
36.30 |
75.55 |
92.35 |
|
slowfast_r50_8xb8-8x8x1-256e_kinetics400-rgb |
34.60 |
66.10 |
76.80 |
92.99 |
|
slowfast_r50_8xb8-8x8x1-steplr-256e_kinetics400-rgb |
34.60 |
66.10 |
76.65 |
92.86 |
|
slowfast_r101_8xb8-8x8x1-256e_kinetics400-rgb |
62.90 |
126.00 |
78.65 |
93.88 |
|
slowfast_r101-r50_32xb8-4x16x1-256e_kinetics400-rgb |
62.40 |
64.90 |
77.03 |
92.99 |
|
slowonly_r50_8xb16-4x16x1-256e_kinetics400-rgb |
32.45 |
27.38 |
72.68 |
90.68 |
|
slowonly_r50_8xb16-8x8x1-256e_kinetics400-rgb |
32.45 |
54.75 |
74.82 |
91.8 |
|
slowonly_r101_8xb16-8x8x1-196e_kinetics400-rgb |
60.36 |
112.00 |
76.28 |
92.7 |
|
slowonly_imagenet-pretrained-r50_8xb16-4x16x1-steplr-150e_kinetics400-rgb |
32.45 |
27.38 |
74.83 |
91.6 |
|
slowonly_imagenet-pretrained-r50_8xb16-8x8x1-steplr-150e_kinetics400-rgb |
32.45 |
54.75 |
75.96 |
92.4 |
|
slowonly_r50-in1k-pre-nl-embedded-gaussian_8xb16-4x16x1-steplr-150e_kinetics400-rgb |
39.81 |
43.23 |
74.84 |
91.41 |
|
slowonly_r50-in1k-pre-nl-embedded-gaussian_8xb16-8x8x1-steplr-150e_kinetics400-rgb |
39.81 |
96.66 |
76.35 |
92.18 |
|
swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb |
28.20 |
88.00 |
78.90 |
93.77 |
|
swin-small-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb |
49.80 |
166.00 |
80.54 |
94.46 |
|
swin-base-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb |
88.00 |
282.00 |
80.57 |
94.49 |
|
swin-large-p244-w877_in22k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb |
197.00 |
604.00 |
83.46 |
95.91 |
|
tanet_imagenet-pretrained-r50_8xb8-dense-1x1x8-100e_kinetics400-rgb |
25.60 |
43.00 |
76.25 |
92.41 |
|
timesformer_divST_8xb8-8x32x1-15e_kinetics400-rgb |
196.00 |
77.69 |
93.45 |
||
timesformer_jointST_8xb8-8x32x1-15e_kinetics400-rgb |
180.00 |
76.95 |
93.28 |
||
timesformer_spaceOnly_8xb8-8x32x1-15e_kinetics400-rgb |
141.00 |
76.93 |
92.88 |
||
tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb |
24.36 |
32.97 |
71.86 |
90.44 |
|
tpn-slowonly_r50_8xb8-8x8x1-150e_kinetics400-rgb |
91.50 |
66.01 |
74.20 |
91.48 |
|
tpn-slowonly_imagenet-pretrained-r50_8xb8-8x8x1-150e_kinetics400-rgb |
91.50 |
66.01 |
76.74 |
92.57 |
|
tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb |
23.87 |
32.88 |
73.18 |
90.56 |
|
tsm_imagenet-pretrained-r50_8xb16-1x1x8-100e_kinetics400-rgb |
23.87 |
32.88 |
73.22 |
90.22 |
|
tsm_imagenet-pretrained-r50_8xb16-1x1x16-50e_kinetics400-rgb |
23.87 |
65.75 |
75.12 |
91.55 |
|
tsm_imagenet-pretrained-r50_8xb16-dense-1x1x8-50e_kinetics400-rgb |
23.87 |
32.88 |
73.38 |
90.78 |
|
tsm_imagenet-pretrained-r50-nl-embedded-gaussian_8xb16-1x1x8-50e_kinetics400-rgb |
31.68 |
61.30 |
74.34 |
91.23 |
|
tsm_imagenet-pretrained-r50-nl-dot-product_8xb16-1x1x8-50e_kinetics400-rgb |
31.68 |
61.30 |
74.49 |
91.15 |
|
tsm_imagenet-pretrained-r50-nl-gaussian_8xb16-1x1x8-50e_kinetics400-rgb |
28.00 |
59.06 |
73.66 |
90.99 |
|
tsm_imagenet-pretrained-mobileone-s4_8xb16-1x1x16-50e_kinetics400-rgb |
13.72 |
48.65 |
74.38 |
91.71 |
|
tsm_imagenet-pretrained-r101_8xb16-1x1x8-50e_sthv2-rgb |
2.74 |
3.27 |
63.70 |
88.28 |
|
tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb |
24.33 |
102.70 |
72.83 |
90.65 |
|
tsn_imagenet-pretrained-r50_8xb32-1x1x5-100e_kinetics400-rgb |
24.33 |
102.70 |
73.80 |
91.21 |
|
tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb |
24.33 |
102.70 |
74.12 |
91.34 |
|
tsn_imagenet-pretrained-r50_8xb32-dense-1x1x5-100e_kinetics400-rgb |
24.33 |
102.70 |
71.37 |
89.67 |
|
tsn_imagenet-pretrained-r101_8xb32-1x1x8-100e_kinetics400-rgb |
43.32 |
195.80 |
75.89 |
92.07 |
|
tsn_imagenet-pretrained-rn101-32x4d_8xb32-1x1x3-100e_kinetics400-rgb |
42.95 |
200.30 |
72.95 |
90.36 |
|
tsn_imagenet-pretrained-dense161_8xb32-1x1x3-100e_kinetics400-rgb |
27.36 |
194.60 |
72.07 |
90.15 |
|
tsn_imagenet-pretrained-swin-transformer_8xb32-1x1x3-100e_kinetics400-rgb |
87.15 |
386.70 |
77.03 |
92.61 |
|
tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb |
87.15 |
386.70 |
79.22 |
94.2 |
|
tsn_imagenet-pretrained-mobileone-s4_8xb32-1x1x8-100e_kinetics400-rgb |
13.72 |
76.00 |
73.65 |
91.32 |
|
tsn_imagenet-pretrained-r50_8xb32-1x1x8-50e_sthv2-rgb |
23.87 |
102.70 |
35.51 |
67.09 |
|
tsn_imagenet-pretrained-r50_8xb32-1x1x16-50e_sthv2-rgb |
23.87 |
102.70 |
36.91 |
68.77 |
|
uniformer-small_imagenet1k-pre_16x4x1_kinetics400-rgb |
80.90 |
94.6 |
|||
uniformer-base_imagenet1k-pre_16x4x1_kinetics400-rgb |
82.00 |
95.0 |
|||
uniformer-base_imagenet1k-pre_32x4x1_kinetics400-rgb |
83.10 |
95.3 |
|||
uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics400-rgb |
84.30 |
96.4 |
|||
uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics400-rgb |
85.80 |
97.1 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics400-rgb |
88.70 |
98.1 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics400-rgb |
89.00 |
98.2 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics400-rgb |
89.30 |
98.2 |
|||
uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics400-rgb |
89.50 |
98.4 |
|||
uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics700-rgb |
82.10 |
96.0 |
|||
vit-base-p16_videomae-k400-pre_16x4x1_kinetics-400 |
81.30 |
95.0 |
|||
vit-large-p16_videomae-k400-pre_16x4x1_kinetics-400 |
85.30 |
96.7 |
|||
vit-small-p16_videomaev2-vit-g-dist-k710-pre_16x4x1_kinetics-400 |
83.60 |
96.3 |
|||
vit-base-p16_videomaev2-vit-g-dist-k710-pre_16x4x1_kinetics-400 |
86.60 |
97.3 |
|||
x3d_s_13x6x1_facebook-kinetics400-rgb |
3.79 |
2.97 |
73.30 |
||
x3d_m_16x5x1_facebook-kinetics400-rgb |
3.79 |
6.49 |
76.40 |
||
tsn_r18_8xb320-64x1x1-100e_kinetics400-audio-feature |
11.40 |
0.37 |
13.70 |
27.3 |
UCF101¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top-5 (%) |
Readme |
---|---|---|---|---|---|
c3d_sports1m-pretrained_8xb30-16x1x1-45e_ucf101-rgb |
78.40 |
38.50 |
83.08 |
95.93 |
SthV2¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top 1 Accuracy (efficient) |
Top-5 (%) |
Top 5 Accuracy (efficient) |
Readme |
---|---|---|---|---|---|---|---|
mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb_infer |
68.10 |
91.00 |
|||||
mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb |
34.40 |
64.00 |
68.20 |
91.30 |
|||
mvit-base-p244_u32_sthv2-rgb |
70.80 |
92.70 |
|||||
mvit-large-p244_u40_sthv2-rgb |
73.20 |
94.00 |
|||||
tin_imagenet-pretrained-r50_8xb6-1x1x8-40e_sthv2-rgb |
23.90 |
32.96 |
54.78 |
82.18 |
|||
trn_imagenet-pretrained-r50_8xb16-1x1x8-50e_sthv2-rgb |
42.94 |
51.20 |
47.65 |
78.42 |
76.27 |
||
tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_sthv2-rgb |
23.87 |
32.88 |
62.72 |
87.70 |
|||
tsm_imagenet-pretrained-r50_8xb16-1x1x16-50e_sthv2-rgb |
23.87 |
65.75 |
64.16 |
88.61 |
|||
tsm_imagenet-pretrained-r101_8xb16-1x1x8-50e_sthv2-rgb |
42.86 |
62.66 |
63.70 |
88.28 |
Kinetics-700¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top-5 (%) |
Readme |
---|---|---|---|---|---|
slowonly_imagenet-pretrained-r50_16xb16-4x16x1-steplr-150e_kinetics700-rgb |
32.45 |
27.38 |
65.18 |
86.05 |
|
slowonly_imagenet-pretrained-r50_16xb16-8x8x1-steplr-150e_kinetics700-rgb |
32.45 |
54.75 |
66.93 |
87.47 |
|
swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb |
197.00 |
604.00 |
75.92 |
92.72 |
|
uniformerv2-base-p16-res224_clip-pre_8xb32-u8_kinetics700-rgb |
75.90 |
92.90 |
|||
uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb |
76.30 |
92.90 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics700-rgb |
80.80 |
95.20 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics700-rgb |
81.20 |
95.60 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics700-rgb |
81.40 |
95.70 |
Kinetics-710¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top-5 (%) |
Readme |
---|---|---|---|---|---|
slowonly_imagenet-pretrained-r50_32xb8-8x8x1-steplr-150e_kinetics710-rgb |
32.45 |
54.75 |
72.39 |
90.60 |
|
swin-small-p244-w877_in1k-pre_32xb4-amp-32x2x1-30e_kinetics710-rgb |
197.00 |
604.00 |
76.90 |
92.96 |
SthV1¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top 1 Accuracy (efficient) |
Top-5 (%) |
Top 5 Accuracy (efficient) |
Readme |
---|---|---|---|---|---|---|---|
tanet_imagenet-pretrained-r50_8xb8-1x1x8-50e_sthv1-rgb |
25.10 |
43.10 |
49.71 |
46.98 |
77.43 |
75.75 |
|
tanet_imagenet-pretrained-r50_8xb6-1x1x16-50e_sthv1-rgb |
25.10 |
86.10 |
50.95 |
48.24 |
79.28 |
78.16 |
|
tin_imagenet-pretrained-r50_8xb6-1x1x8-40e_sthv1-rgb |
23.90 |
32.96 |
38.68 |
68.55 |
|||
tpn-tsm_imagenet-pretrained-r50_8xb8-1x1x8-150e_sthv1-rgb |
82.45 |
54.20 |
51.87 |
79.67 |
|||
trn_imagenet-pretrained-r50_8xb16-1x1x8-50e_sthv1-rgb |
42.94 |
33.65 |
31.6 |
62.22 |
60.15 |
Kinetics-600¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top-5 (%) |
Readme |
---|---|---|---|---|---|
uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics600-rgb |
86.40 |
97.30 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics600-rgb |
89.00 |
98.30 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics600-rgb |
89.40 |
98.30 |
|||
uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics600-rgb |
89.20 |
98.30 |
|||
uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics600-rgb |
89.80 |
98.50 |
Moments in Time V1¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Top-5 (%) |
Readme |
---|---|---|---|---|---|
uniformerv2-base-p16-res224_clip-kinetics710-kinetics-k400-pre_16xb32-u8_mitv1-rgb |
42.30 |
71.50 |
|||
uniformerv2-large-p16-res224_clip-kinetics710-kinetics-k400-pre_u8_mitv1-rgb |
47.00 |
76.10 |
|||
uniformerv2-large-p16-res336_clip-kinetics710-kinetics-k400-pre_u8_mitv1-rgb |
47.70 |
76.80 |
Action Detection¶
AVA v2.1¶
Model |
Params (M) |
Flops (G) |
mAP |
Readme |
---|---|---|---|---|
slowfast-acrn_kinetics400-pretrained-r50_8xb8-8x8x1-cosine-10e_ava21-rgb |
27.65 |
|||
slowonly-lfb-nl_kinetics400-pretrained-r50_8xb12-4x16x1-20e_ava21-rgb |
24.11 |
|||
slowonly-lfb-max_kinetics400-pretrained-r50_8xb12-4x16x1-20e_ava21-rgb |
22.15 |
|||
slowfast_kinetics400-pretrained-r50_8xb16-4x16x1-20e_ava21-rgb |
24.32 |
|||
slowfast_kinetics400-pretrained-r50-context_8xb16-4x16x1-20e_ava21-rgb |
25.34 |
|||
slowfast_kinetics400-pretrained-r50_8xb8-8x8x1-20e_ava21-rgb |
25.80 |
|||
slowonly_kinetics400-pretrained-r50_8xb16-4x16x1-20e_ava21-rgb |
20.72 |
|||
slowonly_kinetics700-pretrained-r50_8xb16-4x16x1-20e_ava21-rgb |
22.77 |
|||
slowonly_kinetics400-pretrained-r50-nl_8xb16-4x16x1-20e_ava21-rgb |
21.55 |
|||
slowonly_kinetics400-pretrained-r50-nl_8xb16-8x8x1-20e_ava21-rgb |
23.77 |
|||
slowonly_kinetics400-pretrained-r101_8xb16-8x8x1-20e_ava21-rgb |
24.83 |
AVA v2.2¶
Model |
Params (M) |
Flops (G) |
mAP |
Readme |
---|---|---|---|---|
slowfast-acrn_kinetics400-pretrained-r50_8xb8-8x8x1-cosine-10e_ava22-rgb |
27.71 |
|||
slowfast_kinetics400-pretrained-r50_8xb6-8x8x1-cosine-10e_ava22-rgb |
25.90 |
|||
slowfast_kinetics400-pretrained-r50-temporal-max_8xb6-8x8x1-cosine-10e_ava22-rgb |
26.41 |
|||
slowfast_r50-k400-pre-temporal-max-focal-alpha3-gamma1_8xb6-8x8x1-cosine-10e_ava22-rgb |
26.65 |
|||
vit-base-p16_videomae-k400-pre_8xb8-16x4x1-20e-adamw_ava-kinetics-rgb |
33.60 |
|||
vit-large-p16_videomae-k400-pre_8xb8-16x4x1-20e-adamw_ava-kinetics-rgb |
38.70 |
MultiSports¶
Model |
Params (M) |
Flops (G) |
f-mAP |
Readme |
---|---|---|---|---|
slowfast_kinetics400-pretrained-r50_8xb16-4x16x1-8e_multisports-rgb |
36.88 |
|||
slowonly_kinetics400-pretrained-r50_8xb16-4x16x1-8e_multisports-rgb |
26.40 |
Skeleton-based Action Recognition¶
NTU60-XSub-2D¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
2s-agcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d |
3.50 |
4.40 |
88.60 |
|
2s-agcn_8xb16-bone-u100-80e_ntu60-xsub-keypoint-2d |
3.50 |
4.40 |
91.59 |
|
2s-agcn_8xb16-joint-motion-u100-80e_ntu60-xsub-keypoint-2d |
3.50 |
4.40 |
88.02 |
|
2s-agcn_8xb16-bone-motion-u100-80e_ntu60-xsub-keypoint-2d |
3.50 |
4.40 |
88.82 |
|
stgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d |
3.10 |
3.80 |
88.95 |
|
stgcn_8xb16-bone-u100-80e_ntu60-xsub-keypoint-2d |
3.10 |
3.80 |
91.69 |
|
stgcn_8xb16-joint-motion-u100-80e_ntu60-xsub-keypoint-2d |
3.10 |
3.80 |
86.90 |
|
stgcn_8xb16-bone-motion-u100-80e_ntu60-xsub-keypoint-2d |
3.10 |
3.80 |
87.86 |
|
stgcnpp_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d |
1.39 |
1.95 |
89.29 |
|
stgcnpp_8xb16-bone-u100-80e_ntu60-xsub-keypoint-2d |
1.39 |
1.95 |
92.30 |
|
stgcnpp_8xb16-joint-motion-u100-80e_ntu60-xsub-keypoint-2d |
1.39 |
1.95 |
87.30 |
|
stgcnpp_8xb16-bone-motion-u100-80e_ntu60-xsub-keypoint-2d |
1.39 |
1.95 |
88.76 |
NTU60-XSub-3D¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
2s-agcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-3d |
3.50 |
6.50 |
88.26 |
|
2s-agcn_8xb16-bone-u100-80e_ntu60-xsub-keypoint-3d |
3.50 |
6.50 |
89.22 |
|
2s-agcn_8xb16-joint-motion-u100-80e_ntu60-xsub-keypoint-3d |
3.50 |
6.50 |
86.73 |
|
2s-agcn_8xb16-bone-motion-u100-80e_ntu60-xsub-keypoint-3d |
3.50 |
6.50 |
86.41 |
|
stgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-3d |
3.10 |
5.70 |
88.11 |
|
stgcn_8xb16-bone-u100-80e_ntu60-xsub-keypoint-3d |
3.10 |
5.70 |
88.76 |
|
stgcn_8xb16-joint-motion-u100-80e_ntu60-xsub-keypoint-3d |
3.10 |
5.70 |
86.06 |
|
stgcn_8xb16-bone-motion-u100-80e_ntu60-xsub-keypoint-3d |
3.10 |
5.70 |
85.49 |
|
stgcnpp_8xb16-joint-u100-80e_ntu60-xsub-keypoint-3d |
1.40 |
2.96 |
89.14 |
|
stgcnpp_8xb16-bone-u100-80e_ntu60-xsub-keypoint-3d |
1.40 |
2.96 |
90.21 |
|
stgcnpp_8xb16-joint-motion-u100-80e_ntu60-xsub-keypoint-3d |
1.40 |
2.96 |
86.67 |
|
stgcnpp_8xb16-bone-motion-u100-80e_ntu60-xsub-keypoint-3d |
1.40 |
2.96 |
87.45 |
FineGYM¶
Model |
Params (M) |
Flops (G) |
mean Top 1 Accuracy |
Readme |
---|---|---|---|---|
slowonly_r50_8xb16-u48-240e_gym-keypoint |
2.00 |
20.60 |
93.50 |
|
slowonly_r50_8xb16-u48-240e_gym-limb |
2.00 |
20.60 |
93.60 |
NTU60-XSub¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
slowonly_r50_8xb16-u48-240e_ntu60-xsub-keypoint |
2.00 |
20.60 |
93.60 |
|
slowonly_r50_8xb16-u48-240e_ntu60-xsub-limb |
2.00 |
20.60 |
93.50 |
HMDB51¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
slowonly_kinetics400-pretrained-r50_8xb16-u48-120e_hmdb51-split1-keypoint |
3.00 |
14.60 |
69.60 |
UCF101¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
slowonly_kinetics400-pretrained-r50_8xb16-u48-120e_ucf101-split1-keypoint |
3.10 |
14.60 |
86.80 |
Kinetic400¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
slowonly_r50_8xb32-u48-240e_k400-keypoint |
3.20 |
19.10 |
47.40 |
NTU120-XSub-2D¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
stgcn_8xb16-joint-u100-80e_ntu120-xsub-keypoint-2d |
3.10 |
3.80 |
83.19 |
|
stgcn_8xb16-bone-u100-80e_ntu120-xsub-keypoint-2d |
3.10 |
3.80 |
83.36 |
|
stgcn_8xb16-joint-motion-u100-80e_ntu120-xsub-keypoint-2d |
3.10 |
3.80 |
78.87 |
|
stgcn_8xb16-bone-motion-u100-80e_ntu120-xsub-keypoint-2d |
3.10 |
3.80 |
79.55 |
NTU120-XSub-3D¶
Model |
Params (M) |
Flops (G) |
Top-1 (%) |
Readme |
---|---|---|---|---|
stgcn_8xb16-joint-u100-80e_ntu120-xsub-keypoint-3d |
3.10 |
5.70 |
82.15 |
|
stgcn_8xb16-bone-u100-80e_ntu120-xsub-keypoint-3d |
3.10 |
5.70 |
84.28 |
|
stgcn_8xb16-joint-motion-u100-80e_ntu120-xsub-keypoint-3d |
3.10 |
5.70 |
78.93 |
|
stgcn_8xb16-bone-motion-u100-80e_ntu120-xsub-keypoint-3d |
3.10 |
5.70 |
80.02 |
Video Retrieval¶
MSRVTT¶
Model |
Params (M) |
Flops (G) |
MdR |
MnR |
Recall@1 |
Recall@10 |
Recall@5 |
Readme |
---|---|---|---|---|---|---|---|---|
clip4clip_vit-base-p32-res224-clip-pre_8xb16-u12-5e_msrvtt-9k-rgb |
2.00 |
16.80 |
43.10 |
78.90 |
69.40 |
Temporal Action Localization¶
ActivityNet v1.3¶
Model |
Params (M) |
Flops (G) |
AR@1 |
AR@10 |
AR@100 |
AR@5 |
AUC |
Readme |
---|---|---|---|---|---|---|---|---|
bmn_2xb8-400x100-9e_activitynet-feature |
32.89 |
56.64 |
75.29 |
49.43 |
67.25 |
|||
bsn_400x100_1xb16_20e_activitynet_feature (cuhk_mean_100) |
32.71 |
55.28 |
74.27 |
48.43 |
66.26 |