y4mscaler(1) | y4mtools manual | y4mscaler(1) |
y4mscaler - Scale/crop/translate a YUV4MPEG2 stream
y4mscaler [options] < Y4Mstream > Y4Mstream
y4mscaler is a general-purpose video scaler which operates on YUV4MPEG2 streams, as produced and consumed by the MJPEGtools such as lav2yuv and mpeg2enc(1).
y4mscaler is meant to be used in a pipeline. Thus, input is from stdin, and output is to stdout.
The essential function of y4mscaler is to scale a specified "active" region of the input stream (the source) into a specified active region of the output stream (the target). Pixels outside of the active region of the source are ignored; pixels outside of the active region of the target are filled with a background color. The source may additionally have a matte applied to it; pixels outside the source matte are set to a separately specified background color.
y4mscaler correctly handles chroma subsampling, and thus it can also perform chroma subsampling conversions. The YUV4MPEG2 stream format supports three varieties of 4:2:0 subsampling, as well as 4:1:1, 4:2:2, 4:4:4, a 4:4:4 modes with an alpha channel, and a monochrome luma-only mode. (See "NOTES ON CHROMA MODES AND SUBSAMPLING".)
y4mscaler can perform simple interlacing conversions: switching from top-field-first to bottom-field-first and vice-versa (by lossily discarding the first field), and creating a progressive stream from interlaced by discarding every other field (effectively halving the vertical resolution).
The source and target are defined by many, many parameters, but y4mscaler has many, many heuristics built-in to automagically set them appropriately. Most source parameters are taken from the input stream header. Remaining source and target parameters which are not specified by the user are guessed in a sane manner.
y4mscaler includes preset parameters for a number of common target streams: DVD, VideoCD (VCD), SuperVCD (SVCD), associated still image formats, and DV.
To create a stream appropriate for use in an SVCD:
To create a stream for a VideoCD (a non-interlaced format), from a DV source (an interlaced format), shifting the input frame 4 pixels to the left:
To take a widescreen NTSC DV source, and convert it to a letterboxed stream, with blue bars on the top and bottom:
To take a widescreen NTSC DV source, and convert it to a "fullscreen" stream (i.e. the sides are clipped, just like on TV):
To take a centered, letterboxed NTSC source, and convert it to a widescreen (16:9) format stream for DVD, with the black bars removed:
To take the center 100x100 pixel chunk of an NTSC DV stream, surround it with a 20-pixel blue border, and blow that up to a full-screen SuperVCD stream:
The first three options, -v, -V, and -h, are simple straightforward options which take either no arguments or one numeric argument.
The -I, -O, and -S options each take one argument of the form parameter=value, which specify parameters for the input, output, and scaling, respectively. These options can be used repeatedly to specify multiple parameters. The parameter names and values are not case-sensitive. Definitions of the form "parameter=[AAA|BBB|CCC]" mean that only one of the listed keywords AAA, BBB, or CCC may be chosen. Succeeding options will override earlier ones.
bg=RGB:r,g,b
bg=YCBCR:y,cb,cr
bg=RGBA:r,g,b,a
bg=YCBCRA:y,cb,cr,a
sar=N:D
sar=[NTSC|PAL|NTSC_WIDE|PAL_WIDE]
size=WxH
size=SRC
bg=RGB:r,g,b
bg=YCBCR:y,cb,cr
bg=RGBA:r,g,b,a
bg=YCBCRA:y,cb,cr,a
sar=N:D
sar=[SRC|NTSC|PAL|NTSC_WIDE|PAL_WIDE]
scale=N/D
Xscale=N/D
Yscale=N/D
preset=[VCD|CVD|SVCD|DVD|DVD_WIDE|DV|DV_WIDE|
SVCD_STILL_HI|SVCD_STILL_LO|VCD_STILL_HI|VCD_STILL_LO|
ATSC_720P|ATSC_1080I|ATSC_1080P]
For the default engine, the available scaler-options select the filter kernel:
To select kernels for the x and y scaling directions independently, use two kernel names separated by a comma, e.g. option=box,quadratic.
sinc:N will give the best quality results (least aliasing), but is the slowest. The quality improves with larger values of N, as does processing time. cubic is generally regarded in the graphics world as the 3rd-order cubic spline with the best trade-off between smoothing and aliasing. box yields the worst quality results (most aliasing), but is the fastest. The default kernel is cubicK4, which has a flatter passband and sharper cutoff than cubic. (It requires the same computational power as sinc:4, but produces less ringing artifacts.)
The following table details the settings provided by the various target "preset=" keywords. When two values are given the primary is for NTSC streams; the value in {braces} is for PAL streams. If interlace value is unspecified, it is inherited from the source, otherwise the indicated target interlacing is required.
Preset Frame Size Interlace SAR Subsampling -----------------------------------------------------------------------
VCD 352x240{288} none 10:11{59:54} 4:2:0-JPEG
CVD 352x480{576} --- 20:11{59:27} 4:2:0-MPEG2
SVCD 480x480{576} --- 15:11{59:36} 4:2:0-MPEG2
DVD 720x480{576} --- 10:11{59:54} 4:2:0-MPEG2
DVD_WIDE 720x480{576} --- 40:33{118:81} 4:2:0-MPEG2
DV 720x480{576} bottom-first 10:11{59:54} 4:1:1
DV_WIDE 720x480{576} bottom-first 40:33{118:81} 4:1:1
SVCD_STILL_HI 704x480{576} none 10:11{59:54} 4:2:0-MPEG2
SVCD_STILL_LO 480x480{576} none 15:11{59:36} 4:2:0-MPEG2
VCD_STILL_HI 704x480{576} none 10:11{59:54} 4:2:0-JPEG
VCD_STILL_LO 352x240{288} none 10:11{59:54} 4:2:0-JPEG
ATSC_720p 1280x720 none 1:1 4:2:0-MPEG2
ATSC_1080i 1920x1080 (required) 1:1 4:2:0-MPEG2
ATSC_1080p 1920x1080 none 1:1 4:2:0-MPEG2
Active and matte regions are specified using a geometry string of the form "WxH+X+Yaa". The "WxH" part specifies the size of the region, as a Width and Height in pixels. (In some cases, the "WxH" may be omitted, and the region size defaults to the full frame size.) The "+X+Y" specifies the position of the region, as an offset relative to the anchor point specified by "aa".
The "aa" code can be one of TL, TC, TR, CL, CC, CR, BL, BC, or BR. These stand for "top-left", "top-center", ..., "bottom-center", "bottom-right". These codes are not case-sensitive.
The "+X+Y" specifies the offset of the region's anchor point from the frame's anchor point. For example, "+20+30TL" means that the top-left corner of the region will be offset 20 pixels to the right and 30 pixels down from the top-left corner of the frame.
The offset values can also be negative. For example, "-4+0CC" means that the center (vertical and horizontal) of the region is offset 4 pixels to the left of the center of the frame.
The default anchoring point for geometry strings is TL, i.e. the top-left corner.
Often, the source and target active regions do not match exactly. This happens when, using the given or calculated scaling ratios, the source region scales to a different size or shape than the target region. In this case, the source and target regions are mutually clipped, so that only the portion of the source which fits will be scaled into the target.
Before any clipping or padding, the source and target regions are aligned so that the points specified via the "align=aa" parameter coincide. The "aa" code specifies an anchor point as described above.
For example, "align=BC" specifies that the bottom-center of the source region should get mapped to the bottom-center of the target region. In other words, the source region will be horizontally centered and vertically aligned to the bottom of the target region before clipping:
---------------- source
|abcdefghijklmn|
---|opqrstuvwxyz01|--- target ----------------
| |234567890ABCDE| | |234567890ABCDE|
| |FGHIJKLMNOPQRS| | |FGHIJKLMNOPQRS|
| |TUVWXYZabcdefg| | |TUVWXYZabcdefg|
---------------------- ----------------
Before Mutually Clipped
If instead "align=TR" were centered, the source would be clipped in a different place, and scaled into a different region of the target frame:
---------------------- ----------------
| |abcdefghijklmn| |abcdefghijklmn|
| |opqrstuvwxyz01| |opqrstuvwxyz01|
| |234567890ABCDE| |234567890ABCDE|
------|FGHIJKLMNOPQRS| ----------------
target |TUVWXYZabcdefg| source
----------------
Before Mutually Clipped
The default alignment mode is "CC", that is, the source and target are mutually centered.
If the X and Y scaling factors are not explicitly provided, y4mscaler will infer the factors from the source and target active regions and sample aspect ratios (SAR's).
If the active regions are not compatible shape-wise (given the SAR's), the source and target regions will be clipped or padded according to one of four policies. The policy is selected using the "infer=" parameter and one of the keywords PAD, CLIP, PRESERVE_X, or PRESERVE_Y. (The default is PAD.)
The policy is further affected by a choice of two other keywords, SIMPLIFY, or EXACT. (The default is SIMPLIFY.)
y4mscaler can convert streams from one chroma subsampling mode to another. Such conversions are always lossy operations, even if the overall frame is undergoing 1/1 scaling.
y4mscaler will infer the source's subsampling mode from tags in the input stream header. The target presets ("preset=XXX") will attempt to set the target subsampling mode appropriately. Otherwise, by default the target subsampling mode will match the source. One can explicitly set the subsampling mode for the source and/or the target by using the "chromass=" parameter.
y4mscaler is capable of reading and writing streams in the 4:4:4, 4:2:2, 4:1:1, and 4:2:0 (all three varieties) subsampling modes. The first three, however, are a relatively new addition to the YUV4MPEG2 standard, and many MJPEGtools will fail to process them correctly, if at all. smil2yuv and raw2yuv can produce native 4:1:1 streams from NTSC DV video, which can then be converted to 4:2:0 by y4mscaler before further processing by other tools.
If the source has an alpha-channel (i.e. 444ALPHA mode) and the target does not, the alpha channel will simply be discarded. On the other hand, if the target has an alpha-channel but the source does not, a constant alpha-channel will be created using the alpha-value of the target's background color (as set by "-O bg="). The default is fully-opaque.
Similarly, if the target has chroma channels but the source does not (i.e. a luma-only MONO stream), then the chroma channels in the output will be set according to the background color.
The YUV4MPEG2 format allows for "mixed-mode interlacing" streams, which may contain a mixture of progressive and interlaced frames. Each frame is tagged as temporally interlaced or progressive, and vertically-subsampled frames (4:2:0 formats) are further tagged as spatially interlaced or not. Unfortunately, this allows for the possibility of anomalous frames, which happen to be temporally interlaced (fields sampled at different times) but spatially progressive (subsampling performed across entire frame), or vice-versa. The only reasonable thing to do with such anomalous frames is to vertically-upsample the chroma, essentially making to problem go away as quickly as possible.
y4mscaler will only process such frames if the target output format is non-vertically-subsampled (e.g. 4:4:4, 4:2:2, etc.) and no other vertical processing is required. Otherwise y4mscaler will bail on processing in midstream when it encounters an anomalous frame. If there is any possibility of encountering such an error, y4mscaler will print a warning when processing begins.
This manual page is copyright 2005 by Matthew Marjanovic.
Feel free to direct any questions, remarks, problems, or bug reports
concerning this tool to <dmg @ mir.com>.
mjpegtools(1), yuv2lav(1), mpeg2enc(1), ppmtoy4m(1), raw2yuv(1), smil2yuv(1), yuvplay(1), yuvscaler(1)
February 14, 2003 | y4mtools |