ITU Artificial Intelligence/Machine Learning in 5G Challenge

Alt Text

The Federal University of Pará (UFPA) invites you to participate in the 2021 ITU Artificial Intelligence/Machine Learning in 5G Challenge, a competition that is scheduled to run from now until the end of the year. Participation in the Challenge is free of charge and open to all interested parties from countries that are members of ITU. Detailed information about it can be found on the 2021 challenge website. In the subsequent sections, we present the ITU-ML5G-PS-006: ML5G-PHY-Reinforcement learning: scheduling and resource allocation.


Reinforcement learning (RL) is one of the most important learning paradigms in telecommunications. However, modeling the RL and dealing with convergence issues are two practical issues that delay the widespread adoption of RL in 5G and future networks. The PS-006 problem statement was designed to challenge the participants in designing an RL agent capable of performing user scheduling and beam selection.

The problem is based on a simulation methodology named CAVIAR (Communication networks and Artificial intelligence immersed in Virtual or Augmented Reality) proposed in [2]. CAVIAR simulation integrates three components: communication system + artificial intelligence (AI) + virtual reality (VR) or augmented reality (AR). In this ITU Challenge, the problem statement PS-006 is based on simulating a communication system immersed in a virtual world created with AirSim and Unreal Engine. In [2], ray-tracing [3] is adopted for generating realistic communication channels. In PS-006, ray-tracing is not used and the channels are created based on the geometric multiple input/multiple-outputs (MIMO) channel model. The problem is posed as a game that must be solved with reinforcement learning (RL).

The goal in PS-006 is to schedule and allocate resources to unmanned aerial vehicles (UAVs) and terrestrial users. The RL agent is executed at the base station. It periodically takes actions based on the information captured from the environment, which includes channel estimates, buffer status, and positions from a Global Navigation Satellite System such as GPS. The communication system model is based on a MIMO system, with the base station having a uniform planar array (UPA) while the users have a single antenna. The RL agent receives a reward based on the service provided to the users. The RL agent can be trained “offline”, without rendering the 3-D scenes. An example of the RL task and the 3-D environment [4] is illustrated in the following video:

Organization of full episode data

The RL tasks can be episodic or continuous. The PS-006 problem statement assumes the former category. The dataset is provided in text files using the .csv (comma-separated values) format. The data corresponding to the trajectories of all mobile objects in a complete episode is stored in a single file. These files can be found in the folder episodes_datasets (i.e. ep0.csv, ep1.csv, etc). Each episode has a duration of approximately 3 minutes, with information stored with a sampling interval of 10 ms. Each CSV file is composed of the following columns:

timestamp obj pos_x pos_y pos_z orien_w orien_x orien_y orien_z linear_acc_x linear_acc_y linear_acc_z linear_vel_x linear_vel_y linear_vel_z angular_acc_x angular_acc_y angular_acc_z angular_vel_x angular_vel_y angular_vel_z

There are three different types of objects: uav, simulation_car and simulation_pedestrian. Only the uav type has information in all columns, while the others (car and pedestrian) only have information regarding their position and orientation. The input to the RL agent can be selected from the information in the CSV files, complemented by information obtained from the environment such as the buffer status.

Organization of baseline data

We provide a baseline RL agent to demonstrate, via a simple example, how the episode data can be used. In the following paragraph, we describe the organization of this simplified dataset used by the baseline RL agent.

A complete episode file has information about all mobile objects in a scene (all pedestrians, cars, etc.). To keep the baseline simple, we assumed that the baseline RL agent uses only information from the three users (uav1, simulation_car1 and simulation_pedestrian1). These are the user equipment (UEs) being served by the base station (BS). Hence, the baseline data was obtained by filtering the original episodes to discard the information about all other mobile objects (which are scatterers, not UEs).

Stages of PS-006 and future developments

There are three stages of interest to PS-006 participants:

  1. Creating the episodes for training or testing the RL agent. The CaviarRenderer-ITU-v1 is executed to create the full episode data and store them as .csv files. This stage will be performed by the organizers and the required datasets will be provided.

  2. Having the files with full episodes available, participants can use the RadioStrike-noRT-v1 environment to train the RL agents. RadioStrike-noRT-v1 is a RL environment written in Python and compliant with the OpenAI Gym API, which is supported by most RL libraries, such as Stable Baselines and RLlib. The baseline RL agent uses “Stable Baselines”, but the participants can use any other RL library.

  3. After having trained a RL agent, the participant feed it with the episode data as input and obtain the RL actions (output). After saving the output to a csv file, the participant can use the CaviarRenderer-ITU-v1 to render a complete episode and save it as a video file. The rendering is not performed in real-time due to performance issues. This final video resembles a game in which the base station schedules users and gets points for that.

The project will continue to evolve after the 2021 ITU Challenge ends, generating free and open-source code for CAVIAR simulations. Currently, users can use CaviarRenderer-ITU-v1 to create their episodes but this is not a supported feature during the 2021 ITU Challenge. Due to limited time, the PS-006 organizers will not provide support to using CaviarRenderer-ITU-v1 in stage 1 above, only in stage 3. Some planned future developments are:

  1. Support to obtaining MIMO channels via ray-tracing [1, 2, 3]

  2. Enable rendering while training the RL agent

  3. Retrieve paired contextual (or out-of-band) information such as images and LIDAR data

  4. Evolved versions of RadioStrike-noRT-v1 Gym environment

Future CAVIAR simulations will be increasingly more realistic. Some of the simplifying assumptions used in PS-006 are:

  1. The UEs have a single antenna. In future versions they will have an antenna array.

  2. We have adopted DFT codebooks for the UPA at the base station and isotropic antenna elements. Hence, the radiation pattern has two main lobes (see Fig. 1 in [5]). For better visualization, CaviarRenderer-ITU-v1 only shows one lobe – the one with a smaller angle distance with the UE. In the future we can use arrays such as the one by 3GPP described in [6].

  3. Instead of obtaining the MIMO channels via ray-tracing, we adopt for PS-006 a simpler approach based on the geometric channel model [7].


For assessing the RL agent, we will use the average return over the test episodes. The return G is the summation of the rewards, defined as a weighted sum of the cell throughput and dropped packets. The dataset with test episodes is disjoint from the one provided for training.

More specifically, the return G is defined as follows. At a given time t, assume:

#pkts_buffered => total (all user buffers) number of packets available for transmission at time t
#pkts_transmitted => total number of successfully transmitted packets at time t
#pkts_dropped => total number of dropped packets at time t

The return is G = \sum_t reward[t] where the reward is given by:

if #pkts_buffered > 0:
    reward[t] = (#pkts_transmitted – 2 #pkts_dropped) / #pkts_buffered
    reward[t] = 0

The participant must provide the predicted output for the test set as a CSV (comma-separated values) file along with the code to reproduce the result. Each row corresponds to the RL agent action (output) for the given time instant. The participant must also provide two video files: a) one with the rendering of the RL agent for the episode with the largest value for the return G, and b) another for the episode with the smallest value of G. These two videos should correspond to the output of CaviarRenderer-ITU-v1 (or later version). If desired, the participant can concatenate these videos into a single video. Another option is to expand the video (or videos) by promoting the designed RL agent, describing its architecture or any other thing the participant thinks is interesting. This promoting video must be recorded in Portuguese or English.

In case of a draw regarding the average return G among distinct participants, we will break it using the best video(s).


Models must be trained only with data included in the provided training episodes. It is not allowed to use additional data extracted from other datasets, generated by the participant, nor obtained by data augmentation.

The inputs to the RL agent (the environment observation, also called state) can be selected from the information provided in the CSV files (position (x,y,z), speeds, etc.) and information obtained from the environment such as the buffer status and channel information for the specific beam index previously selected. Information that is unfeasible to obtain in practice cannot be used. For instance, the simulator knows the magnitude of the combined channels h_i for all i=0,…,63 beam indices, but in practice, this would require a complete beam sweeping, with the transmitter “trying” each of the 64 indices. This complete information cannot be used as input to the agent. However, after a given index j is chosen by the agent at time t, the magnitude h_j of the combined channel is available to be part of the input in time t+1.

The participants must train models that learn from data using the reinforcement learning (RL) paradigm, without using “prior” information obtained from the provided code that generates the MIMO channels or other data. For instance, the MIMO channel has been generated to mimic a line-of-sight (LoS) situation, such that the best beam index is primarily determined by the user position (x,y,z). The participant cannot use such information or any other obtained via “reverse engineering” the provided software.

You can participate in teams of at most four people. The team members should be announced at the enrollment stage and will be considered to have an equal contribution.

The enrolled participants will receive access to upload files to a cloud storage server. Each team is required to upload:

  1. CSV files with the RL agent actions (outputs) for each of the Ne test episodes. Each csv file will have Ns rows, where Ns is the number of scenes per episode, and each row has 2 values: the scheduled user index and the corresponding precoding vector index.

  2. A Report describing the proposed solution (from one to a maximum of three pages) written preferably in English, although Portuguese is also allowed. This report will not be disclosed, such that the participant can eventually publish it elsewhere if desired.

  3. Videos (or a single video) as explained in the Evaluation section.

The participants responsible for the top-five submissions will be later requested to complement their submission with the following:

  1. Source code of the proposed solution for the test stage, including the RL agent model and any required preprocessing stage.

  2. Source code of the proposed solution for the training stage, including any required preprocessing stage.

In summary, the provided code and RL model must allow us to reproduce the reported results for the test set and also regenerating (training) the RL model based on the training set.

* The requested code and any other material are just for review purposes. All participants keep their intellectual property regarding any submitted material.

To rank the models we will use the test dataset, which will be disclosed to the participants only after they submit their work.

All participants of the PS-006 ML5G-PHY [reinforcement learning] task are required to register at the ITU website.

Final submissions and awards

Top-3 solutions of this challenge will have access to the global round of the ITU AI/ML in 5G Challenge, which has the following prizes among others:

  • 1st prize: 5,000 CHF
  • 2nd prize: 3,000 CHF
  • 3rd prize: 2,000 CHF

Besides, the top-3 teams in the PS-006 challenge will be recognized on this website and will receive certificates issued by LASSE / UFPA.

Contact and updates

PS-006 ITU site


  • Preliminary dataset (to allow users to get familiar with the information retrieved from the RadioStrike-v0 environment) → July 01, 2021

  • Baseline reinforcement learning code (example to illustrate how to execute an RL agent for beam selection with the RadioStrike-v0 environment) → July 01, 2021

  • The first part of the training data (500 episodes) → July 23, 2021

  • The second part of the training data (all RL agents are required to be trained with the first and second parts of the dataset) → August 10, 2021

  • Release of test episodes to allow participants to calculate and inform the performance of their agents → September 17, 2021

  • Final submission by participants of their RL agents → October 15, 2021


Francisco Müller


Aldebaro Klautau


Developers and Support

João Borges


Felipe Bastos


Ailton Oliveira


Emerson Oliveira


Camila Novaes


Daniel Takashi


Lucas Matni


Rebecca Aben-Athar



[1] Ailton Oliveira, Felipe Bastos, Isabela Trindade, Walter Frazão, Arthur Nascimento, Diego Gomes, Francisco Müller, Aldebaro Klautau. Simulation of Machine Learning-Based 6G Systems in Virtual Worlds. Submitted to: ITU Journal on Future and Evolving Technologies, 2021.

[2] Aldebaro Klautau, Ailton Oliveira, Isabela Trindade, Wesin Alves. Generating MIMO Channels For 6G Virtual Worlds Using Ray-tracing Simulations. IEEE Statistical Signal Processing Workshop, 2021. ArXiv version is available here.

[3] Aldebaro Klautau, Pedro Batista, Nuria Gonzalez-Prelcic, Yuyang Wang, Robert W. Heath Jr. 5G MIMO Data for Machine Learning: Application to Beam-Selection using Deep Learning. Information Theory and Applications Workshop (ITA), Feb. 2018. DOI: 10.1109/ITA.2018.8503086. PDF preprint is also available here.

[4] The adopted scenario is this pack that includes 355 unique models by Leartes Studios. They have many other cool packs such as the ones listed here. To protect the author’s intellectual property, we follow their guidelines and do not distribute the 3-D models but only the corresponding executable.

[5] Mattia Rebato, Laura Resteghini, Christian Mazzucco and Michele Zorzi, “Study of Realistic Antenna Patterns in 5G mmWave Cellular Scenarios”, IEEE ICC Communications QoS, Reliability, and Modeling Symposium, 2018. Preprint here.

[6] 3GPP TR 37.840 v12.1.0, “Technical Specification Group Radio Access Network; Study of Radio Frequency (RF) and Electromagnetic Compatibility (EMC) requirements for Active Antenna Array System (AAS) base station,” Tech. Rep., 2013. See Table Composite array pattern for multiple columns.

[7] Robert W. Heath Jr., Nuria Gonzalez-Prelcic, S. Rangan, W. Roh, A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems”, IEEE Journal of selected topics in signal processing 10 (3), 436-453, 2016.