ITU Artificial Intelligence/Machine Learning in 5G Challenge

The Federal University of Pará (UFPA) invites you to participate in the 2021 ITU Artificial Intelligence/Machine Learning in 5G Challenge, a competition that is scheduled to run from now until the end of the year. Participation in the Challenge is free of charge and open to all interested parties from countries that are members of ITU. Detailed information about it can be found on the 2021 challenge website. In the subsequent sections, we present the ITU-ML5G-PS-006: ML5G-PHY-Reinforcement learning: scheduling and resource allocation.

Overview

Reinforcement learning (RL) is one of the most important learning paradigms in telecommunications. However, modeling the RL and dealing with convergence issues are two practical issues that delay the widespread adoption of RL in 5G and future networks. The PS-006 problem statement was designed to challenge the participants in designing an RL agent capable of performing user scheduling and beam selection.

The problem is based on a simulation methodology named CAVIAR (Communication networks and Artificial intelligence immersed in Virtual or Augmented Reality) proposed in [2]. CAVIAR simulation integrates three components: communication system + artificial intelligence (AI) + virtual reality (VR) or augmented reality (AR). In this ITU Challenge, the problem statement PS-006 is based on simulating a communication system immersed in a virtual world created with AirSim and Unreal Engine. In [2], ray-tracing [3] is adopted for generating realistic communication channels. In PS-006, ray-tracing is not used and the channels are created based on the geometric multiple input/multiple-outputs (MIMO) channel model. The problem is posed as a game that must be solved with reinforcement learning (RL).

The goal in PS-006 is to schedule and allocate resources to unmanned aerial vehicles (UAVs) and terrestrial users. The RL agent is executed at the base station. It periodically takes actions based on the information captured from the environment, which includes channel estimates, buffer status, and positions from a Global Navigation Satellite System such as GPS. The communication system model is based on a MIMO system, with the base station having a uniform planar array (UPA) while the users have a single antenna. The RL agent receives a reward based on the service provided to the users. The RL agent can be trained “offline”, without rendering the 3-D scenes. An example of the RL task and the 3-D environment [4] is illustrated in the following video:

Code and Links

The software can be found on our Github
Datasets are freely available on Nextcloud
Presentation Slide
Support Material

Organization of full episode data

The RL tasks can be episodic or continuous. The PS-006 problem statement assumes the former category. The dataset is provided in text files using the .csv (comma-separated values) format. The data corresponding to the trajectories of all mobile objects in a complete episode is stored in a single file. These files can be found in the folder episodes_datasets (i.e. ep0.csv, ep1.csv, etc). Each episode has a duration of approximately 3 minutes, with information stored with a sampling interval of 10 ms. Each CSV file is composed of the following columns:

timestamp	obj	pos_x	pos_y	pos_z	orien_w	orien_x	orien_y	orien_z	linear_acc_x	linear_acc_y	linear_acc_z	linear_vel_x	linear_vel_y	linear_vel_z	angular_acc_x	angular_acc_y	angular_acc_z	angular_vel_x	angular_vel_y	angular_vel_z

There are three different types of objects: uav, simulation_car and simulation_pedestrian. Only the uav type has information in all columns, while the others (car and pedestrian) only have information regarding their position and orientation. The input to the RL agent can be selected from the information in the CSV files, complemented by information obtained from the environment such as the buffer status.

Organization of baseline data

We provide a baseline RL agent to demonstrate, via a simple example, how the episode data can be used. In the following paragraph, we describe the organization of this simplified dataset used by the baseline RL agent.

A complete episode file has information about all mobile objects in a scene (all pedestrians, cars, etc.). To keep the baseline simple, we assumed that the baseline RL agent uses only information from the three users (uav1, simulation_car1 and simulation_pedestrian1). These are the user equipment (UEs) being served by the base station (BS). Hence, the baseline data was obtained by filtering the original episodes to discard the information about all other mobile objects (which are scatterers, not UEs).

Stages of PS-006 and future developments

There are three stages of interest to PS-006 participants:

Creating the episodes for training or testing the RL agent. The CaviarRenderer-ITU-v1 is executed to create the full episode data and store them as .csv files. This stage will be performed by the organizers and the required datasets will be provided.
Having the files with full episodes available, participants can use the RadioStrike-noRT-v1 environment to train the RL agents. RadioStrike-noRT-v1 is a RL environment written in Python and compliant with the OpenAI Gym API, which is supported by most RL libraries, such as Stable Baselines and RLlib. The baseline RL agent uses “Stable Baselines”, but the participants can use any other RL library.
After having trained a RL agent, the participant feed it with the episode data as input and obtain the RL actions (output). After saving the output to a csv file, the participant can use the CaviarRenderer-ITU-v1 to render a complete episode and save it as a video file. The rendering is not performed in real-time due to performance issues. This final video resembles a game in which the base station schedules users and gets points for that.

The project will continue to evolve after the 2021 ITU Challenge ends, generating free and open-source code for CAVIAR simulations. Currently, users can use CaviarRenderer-ITU-v1 to create their episodes but this is not a supported feature during the 2021 ITU Challenge. Due to limited time, the PS-006 organizers will not provide support to using CaviarRenderer-ITU-v1 in stage 1 above, only in stage 3. Some planned future developments are:

Support to obtaining MIMO channels via ray-tracing [1, 2, 3]
Enable rendering while training the RL agent
Retrieve paired contextual (or out-of-band) information such as images and LIDAR data
Evolved versions of RadioStrike-noRT-v1 Gym environment

Future CAVIAR simulations will be increasingly more realistic. Some of the simplifying assumptions used in PS-006 are:

The UEs have a single antenna. In future versions they will have an antenna array.
We have adopted DFT codebooks for the UPA at the base station and isotropic antenna elements. Hence, the radiation pattern has two main lobes (see Fig. 1 in [5]). For better visualization, CaviarRenderer-ITU-v1 only shows one lobe – the one with a smaller angle distance with the UE. In the future we can use arrays such as the one by 3GPP described in [6].
Instead of obtaining the MIMO channels via ray-tracing, we adopt for PS-006 a simpler approach based on the geometric channel model [7].

Evaluation

For assessing the RL agent, we will use the average return over the test episodes. The return G is the summation of the rewards, defined as a weighted sum of the cell throughput and dropped packets. The dataset with test episodes is disjoint from the one provided for training.

More specifically, the return G is defined as follows. At a given time t, assume:

#pkts_buffered => total (all user buffers) number of packets available for transmission at time t
#pkts_transmitted => total number of successfully transmitted packets at time t
#pkts_dropped => total number of dropped packets at time t

The return is G = \sum_t reward[t] where the reward is given by:

if #pkts_buffered > 0:
    reward[t] = (#pkts_transmitted – 2 #pkts_dropped) / #pkts_buffered
else:
    reward[t] = 0

The participant must provide the predicted output for the test set as a CSV (comma-separated values) file along with the code to reproduce the result. Each row corresponds to the RL agent action (output) for the given time instant. The participant must also provide two video files: a) one with the rendering of the RL agent for the episode with the largest value for the return G, and b) another for the episode with the smallest value of G. These two videos should correspond to the output of CaviarRenderer-ITU-v1 (or later version). If desired, the participant can concatenate these videos into a single video. Another option is to expand the video (or videos) by promoting the designed RL agent, describing its architecture or any other thing the participant thinks is interesting. This promoting video must be recorded in Portuguese or English.

In case of a draw regarding the average return G among distinct participants, we will break it using the best video(s).

Rules

Models must be trained only with data included in the provided training episodes. It is not allowed to use additional data extracted from other datasets, generated by the participant, nor obtained by data augmentation.

The inputs to the RL agent (the environment observation, also called state) can be selected from the information provided in the CSV files (position (x,y,z), speeds, etc.) and information obtained from the environment such as the buffer status and channel information for the specific beam index previously selected. Information that is unfeasible to obtain in practice cannot be used. For instance, the simulator knows the magnitude of the combined channels h_i for all i=0,…,63 beam indices, but in practice, this would require a complete beam sweeping, with the transmitter “trying” each of the 64 indices. This complete information cannot be used as input to the agent. However, after a given index j is chosen by the agent at time t, the magnitude h_j of the combined channel is available to be part of the input in time t+1.

The participants must train models that learn from data using the reinforcement learning (RL) paradigm, without using “prior” information obtained from the provided code that generates the MIMO channels or other data. For instance, the MIMO channel has been generated to mimic a line-of-sight (LoS) situation, such that the best beam index is primarily determined by the user position (x,y,z). The participant cannot use such information or any other obtained via “reverse engineering” the provided software.

You can participate in teams of at most four people. The team members should be announced at the enrollment stage and will be considered to have an equal contribution.

The enrolled participants will receive access to upload files to a cloud storage server. Each team is required to upload:

CSV files with the RL agent actions (outputs) for each of the Ne test episodes. Each csv file will have Ns rows, where Ns is the number of scenes per episode, and each row has 2 values: the scheduled user index and the corresponding precoding vector index.
A Report describing the proposed solution (from one to a maximum of three pages) written preferably in English, although Portuguese is also allowed. This report will not be disclosed, such that the participant can eventually publish it elsewhere if desired.
Videos (or a single video) as explained in the Evaluation section.

The participants responsible for the top-five submissions will be later requested to complement their submission with the following:

Source code of the proposed solution for the test stage, including the RL agent model and any required preprocessing stage.
Source code of the proposed solution for the training stage, including any required preprocessing stage.

In summary, the provided code and RL model must allow us to reproduce the reported results for the test set and also regenerating (training) the RL model based on the training set.

* The requested code and any other material are just for review purposes. All participants keep their intellectual property regarding any submitted material.

To rank the models we will use the test dataset, which will be disclosed to the participants only after they submit their work.

All participants of the PS-006 ML5G-PHY [reinforcement learning] task are required to register at the ITU website.

Final submissions and awards

Top-3 solutions of this challenge will have access to the global round of the ITU AI/ML in 5G Challenge, which has the following prizes among others:

1st prize: 5,000 CHF
2nd prize: 3,000 CHF
3rd prize: 2,000 CHF

Besides, the top-3 teams in the PS-006 challenge will be recognized on this website and will receive certificates issued by LASSE / UFPA.

Contact and updates

PS-006 ITU site

Timeline

Preliminary dataset (to allow users to get familiar with the information retrieved from the RadioStrike-v0 environment) → July 01, 2021
Baseline reinforcement learning code (example to illustrate how to execute an RL agent for beam selection with the RadioStrike-v0 environment) → July 01, 2021
The first part of the training data (500 episodes) → July 23, 2021
~~The second part of the training data (all RL agents are required to be trained with the first and second parts of the dataset) → August 10, 2021~~
Release of test episodes to allow participants to calculate and inform the performance of their agents → September 17, 2021
Final submission by participants of their RL agents → October 15, 2021

ITU Artificial Intelligence/Machine Learning in 5G Challenge

Overview

Code and Links

Organization of full episode data

Organization of baseline data

Stages of PS-006 and future developments

Evaluation

Rules

Final submissions and awards

Contact and updates

Timeline

Organizers

Francisco Müller

Aldebaro Klautau

Developers and Support

João Borges

Felipe Bastos

Ailton Oliveira

Emerson Oliveira

Camila Novaes

Daniel Takashi

Lucas Matni

Rebecca Aben-Athar

References