docs/av_api.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194

# A/V API reference

## Take toxmsi/phone.c as a reference

### Initialization:

```
phone_t* initPhone(uint16_t _listen_port, uint16_t _send_port);
```

function initializes sample phone. _listen_port and _send_port are variables only meant
for local testing. You will not have to do anything regarding to that since
everything will be started within a messenger.


Phone requires one msi session and two rtp sessions ( one for audio and one for
video ). 

```
msi_session_t* msi_init_session( void* _core_handler, const uint8_t* _user_agent );
```

initializes msi session.
Params:

```
void* _core_handler - pointer to an object handling networking,
const uint8_t* _user_agent - string describing phone client version.
```

Return value:
msi_session_t* - pointer to a newly created msi session handler.

### msi_session_t reference:

How to handle msi session:
Controlling is done via callbacks and action handlers.
First register callbacks for every state/action received and make sure
NOT TO PLACE SOMETHING LIKE LOOPS THAT TAKES A LOT OF TIME TO EXECUTE; every callback is being called 
directly from event loop. You can find examples in phone.c.

Register callbacks: 
```
void msi_register_callback_call_started ( MCALLBACK );
void msi_register_callback_call_canceled ( MCALLBACK );
void msi_register_callback_call_rejected ( MCALLBACK );
void msi_register_callback_call_ended ( MCALLBACK );

void msi_register_callback_recv_invite ( MCALLBACK );
void msi_register_callback_recv_ringing ( MCALLBACK );
void msi_register_callback_recv_starting ( MCALLBACK );
void msi_register_callback_recv_ending ( MCALLBACK );
void msi_register_callback_recv_error ( MCALLBACK );

void msi_register_callback_requ_timeout ( MCALLBACK );
```

MCALLBACK is defined as: void (*callback) (void* _arg)
msi_session_t* handler is being thrown as \_arg so you can use that and \_agent_handler to get to your own phone handler
directly from callback.


Actions:

```
int msi_invite ( msi_session_t* _session, call_type _call_type, uint32_t _timeoutms );
```

Sends call invite. Before calling/sending invite msi_session_t::_friend_id is needed to be set or else
it will not work. _call_type is type of the call ( Audio/Video ) and _timeoutms is how long 
will poll wait until request is terminated.

```
int msi_hangup ( msi_session_t* _session );
```
Hangs up active call

```
int msi_answer ( msi_session_t* _session, call_type _call_type );
```
Answer incoming call. _call_type set's callee call type.

```
int msi_cancel ( msi_session_t* _session );
```
Cancel current request.

```
int msi_reject ( msi_session_t* _session );
```
Reject incoming call.


### Now for rtp:

You will need 2 sessions; one for audio one for video.
You start them with:
```
rtp_session_t* rtp_init_session ( int _max_users, int _multi_session );
```

Params:
```
int _max_users - max users. -1 if undefined
int _multi_session - any positive number means uses multi session; -1 if not.
```

Return value:
```
rtp_session_t* - pointer to a newly created rtp session handler.
```

### How to handle rtp session:
Take a look at
```
void* phone_handle_media_transport_poll ( void* _hmtc_args_p ) in phone.c
```
on example. Basically what you do is just receive a message via:
```
struct rtp_msg_s* rtp_recv_msg ( rtp_session_t* _session );
```

and then you use payload within the rtp_msg_s struct. Don't forget to deallocate it with:
void rtp_free_msg ( rtp_session_t* _session, struct rtp_msg_s* _msg );
Receiving should be thread safe so don't worry about that.

When you capture and encode a payload you want to send it ( obviously ).

first create a new message with:
```
struct rtp_msg_s* rtp_msg_new ( rtp_session_t* _session, const uint8_t* _data, uint32_t _length );
```

and then send it with:
```
int rtp_send_msg ( rtp_session_t* _session, struct rtp_msg_s* _msg, void* _core_handler );
```

_core_handler is the same network handler as in msi_session_s struct.


## A/V initialization:
```
int init_receive_audio(codec_state *cs);
int init_receive_video(codec_state *cs);
Initialises the A/V decoders. On failure it will print the reason and return 0. On success it will return 1.

int init_send_audio(codec_state *cs);
int init_send_video(codec_state *cs);
Initialises the A/V encoders. On failure it will print the reason and return 0. On success it will return 1.
init_send_audio will also let the user select an input device. init_send_video will determine the webcam's output codec and initialise the appropriate decoder.

int video_encoder_refresh(codec_state *cs, int bps);
Reinitialises the video encoder with a new bitrate. ffmpeg does not expose the needed VP8 feature to change the bitrate on the fly, so this serves as a workaround.
In the future, VP8 should be used directly and ffmpeg should be dropped from the dependencies.
The variable bps is the required bitrate in bits per second.
```


### A/V encoding/decoding:
```
void *encode_video_thread(void *arg);
```
Spawns the video encoding thread. The argument should hold a pointer to a codec_state.
This function should only be called if video encoding is supported (when init_send_video returns 1).
Each video frame gets encoded into a packet, which is sent via RTP. Every 60 frames a new bidirectional interframe is encoded.
```
void *encode_audio_thread(void *arg);
```
Spawns the audio encoding thread. The argument should hold a pointer to a codec_state.
This function should only be called if audio encoding is supported (when init_send_audio returns 1).
Audio frames are read from the selected audio capture device during initialisation. This audio capturing can be rerouted to a different device on the fly.
Each audio frame is encoded into a packet, and sent via RTP. All audio frames have the same amount of samples, which is defined in AV_codec.h.
```
int video_decoder_refresh(codec_state *cs, int width, int height);
```
Sets the SDL window dimensions and creates a pixel buffer with the requested size. It also creates a scaling context, which will be used to convert the input image format to YUV420P.

```
void *decode_video_thread(void *arg);
```
Spawns a video decoding thread. The argument should hold a pointer to a codec_state. The codec_state is assumed to contain a successfully initialised video decoder.
This function reads video packets and feeds them to the video decoder. If the video frame's resolution has changed, video_decoder_refresh() is called. Afterwards, the frame is displayed on the SDL window.
```
void *decode_audio_thread(void *arg);
```
Spawns an audio decoding thread. The argument should hold a pointer to a codec_state. The codec_state is assumed to contain a successfully initialised audio decoder.
All received audio packets are pushed into a jitter buffer and are reordered. If there is a missing packet, or a packet has arrived too late, it is treated as a lost packet and the audio decoder is informed of the packet loss. The audio decoder will then try to reconstruct the lost packet, based on information from previous packets.
Audio is played on the default OpenAL output device.


If you have any more qustions/bug reports/feature request contact the following users on the irc channel #tox-dev on irc.freenode.net:
For RTP and MSI: mannol
For audio and video: Martijnvdc