[deb_ffmpeg.git] / ffmpeg / doc / filter_design.txt

Filter design
=============

This document explains guidelines that should be observed (or ignored with
good reason) when writing filters for libavfilter.

In this document, the word “frame” indicates either a video frame or a group
of audio samples, as stored in an AVFilterBuffer structure.


Format negotiation
==================

  The query_formats method should set, for each input and each output links,
  the list of supported formats.

  For video links, that means pixel format. For audio links, that means
  channel layout, sample format (the sample packing is implied by the sample
  format) and sample rate.

  The lists are not just lists, they are references to shared objects. When
  the negotiation mechanism computes the intersection of the formats
  supported at each end of a link, all references to both lists are replaced
  with a reference to the intersection. And when a single format is
  eventually chosen for a link amongst the remaining list, again, all
  references to the list are updated.

  That means that if a filter requires that its input and output have the
  same format amongst a supported list, all it has to do is use a reference
  to the same list of formats.

  query_formats can leave some formats unset and return AVERROR(EAGAIN) to
  cause the negotiation mechanism to try again later. That can be used by
  filters with complex requirements to use the format negotiated on one link
  to set the formats supported on another.


Buffer references ownership and permissions
===========================================

  Principle
  ---------

    Audio and video data are voluminous; the buffer and buffer reference
    mechanism is intended to avoid, as much as possible, expensive copies of
    that data while still allowing the filters to produce correct results.

    The data is stored in buffers represented by AVFilterBuffer structures.
    They must not be accessed directly, but through references stored in
    AVFilterBufferRef structures. Several references can point to the
    same buffer; the buffer is automatically deallocated once all
    corresponding references have been destroyed.

    The characteristics of the data (resolution, sample rate, etc.) are
    stored in the reference; different references for the same buffer can
    show different characteristics. In particular, a video reference can
    point to only a part of a video buffer.

    A reference is usually obtained as input to the start_frame or
    filter_frame method or requested using the ff_get_video_buffer or
    ff_get_audio_buffer functions. A new reference on an existing buffer can
    be created with the avfilter_ref_buffer. A reference is destroyed using
    the avfilter_unref_bufferp function.

  Reference ownership
  -------------------

    At any time, a reference “belongs” to a particular piece of code,
    usually a filter. With a few caveats that will be explained below, only
    that piece of code is allowed to access it. It is also responsible for
    destroying it, although this is sometimes done automatically (see the
    section on link reference fields).

    Here are the (fairly obvious) rules for reference ownership:

    * A reference received by the filter_frame method (or its start_frame
      deprecated version) belongs to the corresponding filter.

      Special exception: for video references: the reference may be used
      internally for automatic copying and must not be destroyed before
      end_frame; it can be given away to ff_start_frame.

    * A reference passed to ff_filter_frame (or the deprecated
      ff_start_frame) is given away and must no longer be used.

    * A reference created with avfilter_ref_buffer belongs to the code that
      created it.

    * A reference obtained with ff_get_video_buffer or ff_get_audio_buffer
      belongs to the code that requested it.

    * A reference given as return value by the get_video_buffer or
      get_audio_buffer method is given away and must no longer be used.

  Link reference fields
  ---------------------

    The AVFilterLink structure has a few AVFilterBufferRef fields. The
    cur_buf and out_buf were used with the deprecated
    start_frame/draw_slice/end_frame API and should no longer be used.
    src_buf, cur_buf_copy and partial_buf are used by libavfilter internally
    and must not be accessed by filters.

  Reference permissions
  ---------------------

    The AVFilterBufferRef structure has a perms field that describes what
    the code that owns the reference is allowed to do to the buffer data.
    Different references for the same buffer can have different permissions.

    For video filters that implement the deprecated
    start_frame/draw_slice/end_frame API, the permissions only apply to the
    parts of the buffer that have already been covered by the draw_slice
    method.

    The value is a binary OR of the following constants:

    * AV_PERM_READ: the owner can read the buffer data; this is essentially
      always true and is there for self-documentation.

    * AV_PERM_WRITE: the owner can modify the buffer data.

    * AV_PERM_PRESERVE: the owner can rely on the fact that the buffer data
      will not be modified by previous filters.

    * AV_PERM_REUSE: the owner can output the buffer several times, without
      modifying the data in between.

    * AV_PERM_REUSE2: the owner can output the buffer several times and
      modify the data in between (useless without the WRITE permissions).

    * AV_PERM_ALIGN: the owner can access the data using fast operations
      that require data alignment.

    The READ, WRITE and PRESERVE permissions are about sharing the same
    buffer between several filters to avoid expensive copies without them
    doing conflicting changes on the data.

    The REUSE and REUSE2 permissions are about special memory for direct
    rendering. For example a buffer directly allocated in video memory must
    not modified once it is displayed on screen, or it will cause tearing;
    it will therefore not have the REUSE2 permission.

    The ALIGN permission is about extracting part of the buffer, for
    copy-less padding or cropping for example.


    References received on input pads are guaranteed to have all the
    permissions stated in the min_perms field and none of the permissions
    stated in the rej_perms.

    References obtained by ff_get_video_buffer and ff_get_audio_buffer are
    guaranteed to have at least all the permissions requested as argument.

    References created by avfilter_ref_buffer have the same permissions as
    the original reference minus the ones explicitly masked; the mask is
    usually ~0 to keep the same permissions.

    Filters should remove permissions on reference they give to output
    whenever necessary. It can be automatically done by setting the
    rej_perms field on the output pad.

    Here are a few guidelines corresponding to common situations:

    * Filters that modify and forward their frame (like drawtext) need the
      WRITE permission.

    * Filters that read their input to produce a new frame on output (like
      scale) need the READ permission on input and must request a buffer
      with the WRITE permission.

    * Filters that intend to keep a reference after the filtering process
      is finished (after filter_frame returns) must have the PRESERVE
      permission on it and remove the WRITE permission if they create a new
      reference to give it away.

    * Filters that intend to modify a reference they have kept after the end
      of the filtering process need the REUSE2 permission and must remove
      the PRESERVE permission if they create a new reference to give it
      away.


Frame scheduling
================

  The purpose of these rules is to ensure that frames flow in the filter
  graph without getting stuck and accumulating somewhere.

  Simple filters that output one frame for each input frame should not have
  to worry about it.

  filter_frame
  ------------

    This method is called when a frame is pushed to the filter's input. It
    can be called at any time except in a reentrant way.

    If the input frame is enough to produce output, then the filter should
    push the output frames on the output link immediately.

    As an exception to the previous rule, if the input frame is enough to
    produce several output frames, then the filter needs output only at
    least one per link. The additional frames can be left buffered in the
    filter; these buffered frames must be flushed immediately if a new input
    produces new output.

    (Example: frame rate-doubling filter: filter_frame must (1) flush the
    second copy of the previous frame, if it is still there, (2) push the
    first copy of the incoming frame, (3) keep the second copy for later.)

    If the input frame is not enough to produce output, the filter must not
    call request_frame to get more. It must just process the frame or queue
    it. The task of requesting more frames is left to the filter's
    request_frame method or the application.

    If a filter has several inputs, the filter must be ready for frames
    arriving randomly on any input. Therefore, any filter with several inputs
    will most likely require some kind of queuing mechanism. It is perfectly
    acceptable to have a limited queue and to drop frames when the inputs
    are too unbalanced.

  request_frame
  -------------

    This method is called when a frame is wanted on an output.

    For an input, it should directly call filter_frame on the corresponding
    output.

    For a filter, if there are queued frames already ready, one of these
    frames should be pushed. If not, the filter should request a frame on
    one of its inputs, repeatedly until at least one frame has been pushed.

    Return values:
    if request_frame could produce a frame, it should return 0;
    if it could not for temporary reasons, it should return AVERROR(EAGAIN);
    if it could not because there are no more frames, it should return
    AVERROR_EOF.

    The typical implementation of request_frame for a filter with several
    inputs will look like that:

        if (frames_queued) {
            push_one_frame();
            return 0;
        }
        while (!frame_pushed) {
            input = input_where_a_frame_is_most_needed();
            ret = ff_request_frame(input);
            if (ret == AVERROR_EOF) {
                process_eof_on_input();
            } else if (ret < 0) {
                return ret;
            }
        }
        return 0;

    Note that, except for filters that can have queued frames, request_frame
    does not push frames: it requests them to its input, and as a reaction,
    the filter_frame method will be called and do the work.

Legacy API
==========

  Until libavfilter 3.23, the filter_frame method was split:

  - for video filters, it was made of start_frame, draw_slice (that could be
    called several times on distinct parts of the frame) and end_frame;

  - for audio filters, it was called filter_samples.
Commit	Line	Data
	1	Filter design
	2	=============
	3
	4	This document explains guidelines that should be observed (or ignored with
	5	good reason) when writing filters for libavfilter.
	6
	7	In this document, the word “frame” indicates either a video frame or a group
	8	of audio samples, as stored in an AVFilterBuffer structure.
	9
	10
	11	Format negotiation
	12	==================
	13
	14	The query_formats method should set, for each input and each output links,
	15	the list of supported formats.
	16
	17	For video links, that means pixel format. For audio links, that means
	18	channel layout, sample format (the sample packing is implied by the sample
	19	format) and sample rate.
	20
	21	The lists are not just lists, they are references to shared objects. When
	22	the negotiation mechanism computes the intersection of the formats
	23	supported at each end of a link, all references to both lists are replaced
	24	with a reference to the intersection. And when a single format is
	25	eventually chosen for a link amongst the remaining list, again, all
	26	references to the list are updated.
	27
	28	That means that if a filter requires that its input and output have the
	29	same format amongst a supported list, all it has to do is use a reference
	30	to the same list of formats.
	31
	32	query_formats can leave some formats unset and return AVERROR(EAGAIN) to
	33	cause the negotiation mechanism to try again later. That can be used by
	34	filters with complex requirements to use the format negotiated on one link
	35	to set the formats supported on another.
	36
	37
	38	Buffer references ownership and permissions
	39	===========================================
	40
	41	Principle
	42	---------
	43
	44	Audio and video data are voluminous; the buffer and buffer reference
	45	mechanism is intended to avoid, as much as possible, expensive copies of
	46	that data while still allowing the filters to produce correct results.
	47
	48	The data is stored in buffers represented by AVFilterBuffer structures.
	49	They must not be accessed directly, but through references stored in
	50	AVFilterBufferRef structures. Several references can point to the
	51	same buffer; the buffer is automatically deallocated once all
	52	corresponding references have been destroyed.
	53
	54	The characteristics of the data (resolution, sample rate, etc.) are
	55	stored in the reference; different references for the same buffer can
	56	show different characteristics. In particular, a video reference can
	57	point to only a part of a video buffer.
	58
	59	A reference is usually obtained as input to the start_frame or
	60	filter_frame method or requested using the ff_get_video_buffer or
	61	ff_get_audio_buffer functions. A new reference on an existing buffer can
	62	be created with the avfilter_ref_buffer. A reference is destroyed using
	63	the avfilter_unref_bufferp function.
	64
	65	Reference ownership
	66	-------------------
	67
	68	At any time, a reference “belongs” to a particular piece of code,
	69	usually a filter. With a few caveats that will be explained below, only
	70	that piece of code is allowed to access it. It is also responsible for
	71	destroying it, although this is sometimes done automatically (see the
	72	section on link reference fields).
	73
	74	Here are the (fairly obvious) rules for reference ownership:
	75
	76	* A reference received by the filter_frame method (or its start_frame
	77	deprecated version) belongs to the corresponding filter.
	78
	79	Special exception: for video references: the reference may be used
	80	internally for automatic copying and must not be destroyed before
	81	end_frame; it can be given away to ff_start_frame.
	82
	83	* A reference passed to ff_filter_frame (or the deprecated
	84	ff_start_frame) is given away and must no longer be used.
	85
	86	* A reference created with avfilter_ref_buffer belongs to the code that
	87	created it.
	88
	89	* A reference obtained with ff_get_video_buffer or ff_get_audio_buffer
	90	belongs to the code that requested it.
	91
	92	* A reference given as return value by the get_video_buffer or
	93	get_audio_buffer method is given away and must no longer be used.
	94
	95	Link reference fields
	96	---------------------
	97
	98	The AVFilterLink structure has a few AVFilterBufferRef fields. The
	99	cur_buf and out_buf were used with the deprecated
	100	start_frame/draw_slice/end_frame API and should no longer be used.
	101	src_buf, cur_buf_copy and partial_buf are used by libavfilter internally
	102	and must not be accessed by filters.
	103
	104	Reference permissions
	105	---------------------
	106
	107	The AVFilterBufferRef structure has a perms field that describes what
	108	the code that owns the reference is allowed to do to the buffer data.
	109	Different references for the same buffer can have different permissions.
	110
	111	For video filters that implement the deprecated
	112	start_frame/draw_slice/end_frame API, the permissions only apply to the
	113	parts of the buffer that have already been covered by the draw_slice
	114	method.
	115
	116	The value is a binary OR of the following constants:
	117
	118	* AV_PERM_READ: the owner can read the buffer data; this is essentially
	119	always true and is there for self-documentation.
	120
	121	* AV_PERM_WRITE: the owner can modify the buffer data.
	122
	123	* AV_PERM_PRESERVE: the owner can rely on the fact that the buffer data
	124	will not be modified by previous filters.
	125
	126	* AV_PERM_REUSE: the owner can output the buffer several times, without
	127	modifying the data in between.
	128
	129	* AV_PERM_REUSE2: the owner can output the buffer several times and
	130	modify the data in between (useless without the WRITE permissions).
	131
	132	* AV_PERM_ALIGN: the owner can access the data using fast operations
	133	that require data alignment.
	134
	135	The READ, WRITE and PRESERVE permissions are about sharing the same
	136	buffer between several filters to avoid expensive copies without them
	137	doing conflicting changes on the data.
	138
	139	The REUSE and REUSE2 permissions are about special memory for direct
	140	rendering. For example a buffer directly allocated in video memory must
	141	not modified once it is displayed on screen, or it will cause tearing;
	142	it will therefore not have the REUSE2 permission.
	143
	144	The ALIGN permission is about extracting part of the buffer, for
	145	copy-less padding or cropping for example.
	146
	147
	148	References received on input pads are guaranteed to have all the
	149	permissions stated in the min_perms field and none of the permissions
	150	stated in the rej_perms.
	151
	152	References obtained by ff_get_video_buffer and ff_get_audio_buffer are
	153	guaranteed to have at least all the permissions requested as argument.
	154
	155	References created by avfilter_ref_buffer have the same permissions as
	156	the original reference minus the ones explicitly masked; the mask is
	157	usually ~0 to keep the same permissions.
	158
	159	Filters should remove permissions on reference they give to output
	160	whenever necessary. It can be automatically done by setting the
	161	rej_perms field on the output pad.
	162
	163	Here are a few guidelines corresponding to common situations:
	164
	165	* Filters that modify and forward their frame (like drawtext) need the
	166	WRITE permission.
	167
	168	* Filters that read their input to produce a new frame on output (like
	169	scale) need the READ permission on input and must request a buffer
	170	with the WRITE permission.
	171
	172	* Filters that intend to keep a reference after the filtering process
	173	is finished (after filter_frame returns) must have the PRESERVE
	174	permission on it and remove the WRITE permission if they create a new
	175	reference to give it away.
	176
	177	* Filters that intend to modify a reference they have kept after the end
	178	of the filtering process need the REUSE2 permission and must remove
	179	the PRESERVE permission if they create a new reference to give it
	180	away.
	181
	182
	183	Frame scheduling
	184	================
	185
	186	The purpose of these rules is to ensure that frames flow in the filter
	187	graph without getting stuck and accumulating somewhere.
	188
	189	Simple filters that output one frame for each input frame should not have
	190	to worry about it.
	191
	192	filter_frame
	193	------------
	194
	195	This method is called when a frame is pushed to the filter's input. It
	196	can be called at any time except in a reentrant way.
	197
	198	If the input frame is enough to produce output, then the filter should
	199	push the output frames on the output link immediately.
	200
	201	As an exception to the previous rule, if the input frame is enough to
	202	produce several output frames, then the filter needs output only at
	203	least one per link. The additional frames can be left buffered in the
	204	filter; these buffered frames must be flushed immediately if a new input
	205	produces new output.
	206
	207	(Example: frame rate-doubling filter: filter_frame must (1) flush the
	208	second copy of the previous frame, if it is still there, (2) push the
	209	first copy of the incoming frame, (3) keep the second copy for later.)
	210
	211	If the input frame is not enough to produce output, the filter must not
	212	call request_frame to get more. It must just process the frame or queue
	213	it. The task of requesting more frames is left to the filter's
	214	request_frame method or the application.
	215
	216	If a filter has several inputs, the filter must be ready for frames
	217	arriving randomly on any input. Therefore, any filter with several inputs
	218	will most likely require some kind of queuing mechanism. It is perfectly
	219	acceptable to have a limited queue and to drop frames when the inputs
	220	are too unbalanced.
	221
	222	request_frame
	223	-------------
	224
	225	This method is called when a frame is wanted on an output.
	226
	227	For an input, it should directly call filter_frame on the corresponding
	228	output.
	229
	230	For a filter, if there are queued frames already ready, one of these
	231	frames should be pushed. If not, the filter should request a frame on
	232	one of its inputs, repeatedly until at least one frame has been pushed.
	233
	234	Return values:
	235	if request_frame could produce a frame, it should return 0;
	236	if it could not for temporary reasons, it should return AVERROR(EAGAIN);
	237	if it could not because there are no more frames, it should return
	238	AVERROR_EOF.
	239
	240	The typical implementation of request_frame for a filter with several
	241	inputs will look like that:
	242
	243	if (frames_queued) {
	244	push_one_frame();
	245	return 0;
	246	}
	247	while (!frame_pushed) {
	248	input = input_where_a_frame_is_most_needed();
	249	ret = ff_request_frame(input);
	250	if (ret == AVERROR_EOF) {
	251	process_eof_on_input();
	252	} else if (ret < 0) {
	253	return ret;
	254	}
	255	}
	256	return 0;
	257
	258	Note that, except for filters that can have queued frames, request_frame
	259	does not push frames: it requests them to its input, and as a reaction,
	260	the filter_frame method will be called and do the work.
	261
	262	Legacy API
	263	==========
	264
	265	Until libavfilter 3.23, the filter_frame method was split:
	266
	267	- for video filters, it was made of start_frame, draw_slice (that could be
	268	called several times on distinct parts of the frame) and end_frame;
	269
	270	- for audio filters, it was called filter_samples.