The Federal University of Pará (UFPA) invites you to participate in the 2021 ITU Artificial Intelligence/Machine Learning in 5G Challenge, a competition that is scheduled to run from now until the end of the year. Participation in the Challenge is free of charge and open to all interested parties from countries that are members of ITU. Detailed information about it can be found on the 2021 challenge website. In the subsequent sections, we present the ITU-ML5G-PS-006: ML5G-PHY-Reinforcement learning: scheduling and resource allocation.
Reinforcement learning (RL) is one of the most important learning paradigms in telecommunications. However, modeling the RL and dealing with convergence issues are two practical issues that delay the widespread adoption of RL in 5G and future networks. The PS-006 problem statement was designed to challenge the participants in designing an RL agent capable of performing user scheduling and beam selection.
The problem is based on a simulation methodology named CAVIAR (Communication networks and Artificial intelligence immersed in Virtual or Augmented Reality) proposed in . CAVIAR simulation integrates three components: communication system + artificial intelligence (AI) + virtual reality (VR) or augmented reality (AR). In this ITU Challenge, the problem statement PS-006 is based on simulating a communication system immersed in a virtual world created with AirSim and Unreal Engine. In , ray-tracing  is adopted for generating realistic communication channels. In PS-006, ray-tracing is not used and the channels are created based on the geometric multiple input/multiple-outputs (MIMO) channel model. The problem is posed as a game that must be solved with reinforcement learning (RL).
The goal in PS-006 is to schedule and allocate resources to unmanned aerial vehicles (UAVs) and terrestrial users. The RL agent is executed at the base station. It periodically takes actions based on the information captured from the environment, which includes channel estimates, buffer status, and positions from a Global Navigation Satellite System such as GPS. The communication system model is based on a MIMO system, with the base station having a uniform planar array (UPA) while the users have a single antenna. The RL agent receives a reward based on the service provided to the users. The RL agent can be trained “offline”, without rendering the 3-D scenes. An example of the RL task and the 3-D environment  is illustrated in the following video:
The RL tasks can be episodic or continuous. The PS-006 problem statement
assumes the former category. The dataset is provided in text files using the
.csv (comma-separated values) format. The data corresponding to the
trajectories of all mobile objects in a complete episode is stored in a single
file. These files can be found in the folder
ep1.csv, etc). Each
episode has a duration of approximately 3
minutes, with information stored with a sampling interval of 10 ms. Each CSV
file is composed of the following columns:
There are three different types of objects:
simulation_pedestrian. Only the
uav type has information in all columns,
while the others (car and pedestrian) only have information regarding their
position and orientation. The input to the RL agent can be selected from the
information in the CSV files, complemented by information obtained from the
environment such as the buffer status.
We provide a baseline RL agent to demonstrate, via a simple example, how the episode data can be used. In the following paragraph, we describe the organization of this simplified dataset used by the baseline RL agent.
A complete episode file has information about all mobile objects in a scene
(all pedestrians, cars, etc.). To keep the baseline simple, we assumed that the
baseline RL agent uses only information from the three users (
simulation_pedestrian1). These are the user equipment
(UEs) being served by the base station (BS). Hence, the baseline data was
obtained by filtering the original episodes to discard the information about
all other mobile objects (which are scatterers, not UEs).
There are three stages of interest to PS-006 participants:
Creating the episodes for training or testing the RL agent. The CaviarRenderer-ITU-v1 is executed to create the full episode data and store them as .csv files. This stage will be performed by the organizers and the required datasets will be provided.
Having the files with full episodes available, participants can use the RadioStrike-noRT-v1 environment to train the RL agents. RadioStrike-noRT-v1 is a RL environment written in Python and compliant with the OpenAI Gym API, which is supported by most RL libraries, such as Stable Baselines and RLlib. The baseline RL agent uses “Stable Baselines”, but the participants can use any other RL library.
After having trained a RL agent, the participant feed it with the episode data as input and obtain the RL actions (output). After saving the output to a csv file, the participant can use the CaviarRenderer-ITU-v1 to render a complete episode and save it as a video file. The rendering is not performed in real-time due to performance issues. This final video resembles a game in which the base station schedules users and gets points for that.
The project will continue to evolve after the 2021 ITU Challenge ends, generating free and open-source code for CAVIAR simulations. Currently, users can use CaviarRenderer-ITU-v1 to create their episodes but this is not a supported feature during the 2021 ITU Challenge. Due to limited time, the PS-006 organizers will not provide support to using CaviarRenderer-ITU-v1 in stage 1 above, only in stage 3. Some planned future developments are:
Support to obtaining MIMO channels via ray-tracing [1, 2, 3]
Enable rendering while training the RL agent
Retrieve paired contextual (or out-of-band) information such as images and LIDAR data
Evolved versions of RadioStrike-noRT-v1 Gym environment
Future CAVIAR simulations will be increasingly more realistic. Some of the simplifying assumptions used in PS-006 are:
The UEs have a single antenna. In future versions they will have an antenna array.
We have adopted DFT codebooks for the UPA at the base station and isotropic antenna elements. Hence, the radiation pattern has two main lobes (see Fig. 1 in ). For better visualization, CaviarRenderer-ITU-v1 only shows one lobe – the one with a smaller angle distance with the UE. In the future we can use arrays such as the one by 3GPP described in .
Instead of obtaining the MIMO channels via ray-tracing, we adopt for PS-006 a simpler approach based on the geometric channel model .
For assessing the RL agent, we will use the average return over the test episodes. The return G is the summation of the rewards, defined as a weighted sum of the cell throughput and dropped packets. The dataset with test episodes is disjoint from the one provided for training.
More specifically, the return G is defined as follows. At a given time t, assume:
#pkts_buffered => total (all user buffers) number of packets available for transmission at time t #pkts_transmitted => total number of successfully transmitted packets at time t #pkts_dropped => total number of dropped packets at time t
The return is G = \sum_t reward[t] where the reward is given by:
if #pkts_buffered > 0: reward[t] = (#pkts_transmitted – 2 #pkts_dropped) / #pkts_buffered else: reward[t] = 0
The participant must provide the predicted output for the test set as a CSV (comma-separated values) file along with the code to reproduce the result. Each row corresponds to the RL agent action (output) for the given time instant. The participant must also provide two video files: a) one with the rendering of the RL agent for the episode with the largest value for the return G, and b) another for the episode with the smallest value of G. These two videos should correspond to the output of CaviarRenderer-ITU-v1 (or later version). If desired, the participant can concatenate these videos into a single video. Another option is to expand the video (or videos) by promoting the designed RL agent, describing its architecture or any other thing the participant thinks is interesting. This promoting video must be recorded in Portuguese or English.
In case of a draw regarding the average return G among distinct participants, we will break it using the best video(s).
Models must be trained only with data included in the provided training episodes. It is not allowed to use additional data extracted from other datasets, generated by the participant, nor obtained by data augmentation.
The inputs to the RL agent (the environment observation, also called
state) can be selected from the information provided in the CSV files
(position (x,y,z), speeds, etc.) and information obtained from the
environment such as the buffer status and channel information for the
specific beam index previously selected. Information that is unfeasible
to obtain in practice cannot be used. For instance, the simulator knows
the magnitude of the combined channels
h_i for all
indices, but in practice, this would require a complete beam sweeping,
with the transmitter “trying” each of the 64 indices. This complete
information cannot be used as input to the agent. However, after a
j is chosen by the agent at time
t, the magnitude
of the combined channel is available to be part of the input in time
The participants must train models that learn from data using the reinforcement learning (RL) paradigm, without using “prior” information obtained from the provided code that generates the MIMO channels or other data. For instance, the MIMO channel has been generated to mimic a line-of-sight (LoS) situation, such that the best beam index is primarily determined by the user position (x,y,z). The participant cannot use such information or any other obtained via “reverse engineering” the provided software.
You can participate in teams of at most four people. The team members should be announced at the enrollment stage and will be considered to have an equal contribution.
The enrolled participants will receive access to upload files to a cloud storage server. Each team is required to upload:
CSV files with the RL agent actions (outputs) for each of the Ne test episodes. Each csv file will have Ns rows, where Ns is the number of scenes per episode, and each row has 2 values: the scheduled user index and the corresponding precoding vector index.
A Report describing the proposed solution (from one to a maximum of three pages) written preferably in English, although Portuguese is also allowed. This report will not be disclosed, such that the participant can eventually publish it elsewhere if desired.
Videos (or a single video) as explained in the Evaluation section.
The participants responsible for the top-five submissions will be later requested to complement their submission with the following:
Source code of the proposed solution for the test stage, including the RL agent model and any required preprocessing stage.
Source code of the proposed solution for the training stage, including any required preprocessing stage.
In summary, the provided code and RL model must allow us to reproduce the reported results for the test set and also regenerating (training) the RL model based on the training set.
* The requested code and any other material are just for review purposes. All participants keep their intellectual property regarding any submitted material.
To rank the models we will use the test dataset, which will be disclosed to the participants only after they submit their work.
All participants of the PS-006 ML5G-PHY [reinforcement learning] task are required to register at the ITU website.
Top-3 solutions of this challenge will have access to the global round of the ITU AI/ML in 5G Challenge, which has the following prizes among others:
Besides, the top-3 teams in the PS-006 challenge will be recognized on this website and will receive certificates issued by LASSE / UFPA.
Preliminary dataset (to allow users to get familiar with the
information retrieved from the RadioStrike-v0 environment) → July 01,
2021 Baseline reinforcement learning code (example to illustrate how to
execute an RL agent for beam selection with the RadioStrike-v0
environment) → July 01, 2021 The first part of the training data (500 episodes) → July 23, 2021 The second part of the training data (all RL agents are required to be
trained with the first and second parts of the dataset) → August 10,
2021 Release of test episodes to allow participants to calculate and inform
the performance of their agents → September 17, 2021
Final submission by participants of their RL agents → October 15, 2021
 Ailton Oliveira, Felipe Bastos, Isabela Trindade, Walter Frazão, Arthur Nascimento, Diego Gomes, Francisco Müller, Aldebaro Klautau. Simulation of Machine Learning-Based 6G Systems in Virtual Worlds. Submitted to: ITU Journal on Future and Evolving Technologies, 2021.
 Aldebaro Klautau, Ailton Oliveira, Isabela Trindade, Wesin Alves. Generating MIMO Channels For 6G Virtual Worlds Using Ray-tracing Simulations. IEEE Statistical Signal Processing Workshop, 2021. ArXiv version is available here.
 Aldebaro Klautau, Pedro Batista, Nuria Gonzalez-Prelcic, Yuyang Wang, Robert W. Heath Jr. 5G MIMO Data for Machine Learning: Application to Beam-Selection using Deep Learning. Information Theory and Applications Workshop (ITA), Feb. 2018. DOI: 10.1109/ITA.2018.8503086. PDF preprint is also available here.
 The adopted scenario is this pack that includes 355 unique models by Leartes Studios. They have many other cool packs such as the ones listed here. To protect the author’s intellectual property, we follow their guidelines and do not distribute the 3-D models but only the corresponding executable.
 Mattia Rebato, Laura Resteghini, Christian Mazzucco and Michele Zorzi, “Study of Realistic Antenna Patterns in 5G mmWave Cellular Scenarios”, IEEE ICC Communications QoS, Reliability, and Modeling Symposium, 2018. Preprint here.
 3GPP TR 37.840 v12.1.0, “Technical Specification Group Radio Access Network; Study of Radio Frequency (RF) and Electromagnetic Compatibility (EMC) requirements for Active Antenna Array System (AAS) base station,” Tech. Rep., 2013. See Table 18.104.22.168-3: Composite array pattern for multiple columns.
 Robert W. Heath Jr., Nuria Gonzalez-Prelcic, S. Rangan, W. Roh, A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems”, IEEE Journal of selected topics in signal processing 10 (3), 436-453, 2016.