Tracking people in crowded scenes

Publication Type:

Conference Paper


PED - Pedestrian and Evacuation Dynamics (2012)


Pedestrian tracking; crowded scene; pedestrian dynamics


For the proper understanding and modelling of pedestrian dynamics, reliable empirical data is necessary for analysis and verification. Collecting the trajectories of every person with a high temporal and spatial resolution allows a detailed analysis of movement and the calibration and verification of microscopic models in space and time [Steffen2011, Chraibi2010].
In recent years we have performed an extensive series of well-defined experiments with up to 350 people to study the movement of pedestrians in different situations [Seyfried2010, Holl2009]. These laboratory experiments give us the opportunity to analyse parameters of interest under controlled conditions. The variability allows a survey of a parameter range e.g. for the bottleneck width or length, or the density inside a corridor. Parameters, like the density, can be set to values seldom seen in field studies (e.g. very high densities). For such experiments the characteristics of the test persons (e.g. culture, fitness, age, gender, body height) can be determined.
For the analysis of these experiments we have developed a software to automatically extract trajectories from video recordings of marked people on plane ground [Boltes2010] and uneven terrain [Boltes2011]. The program is able to handle lens distortion and high pedestrian densities. For experiments e.g. at stairs but also for experiments on plane ground stereo recordings are needed to get spatial trajectories and to take the perspective distortion into account.
Despite the above mentioned advantages the experiments under laboratory conditions have also drawbacks. The number of experiments is limited due to the costs of the test persons and for building the artificial environments. Thus the variance of the studied parameter is limited. Also the combination of differences inside a detected group cannot be covered by laboratory experiments.
Therefore we present a new approach to detect pedestrians without marker also in crowded scenes to facilitate field studies and the easier realization of moderated experiments in real environments. The newly introduced method based on the analysis of the depth field of stereo recordings taken from overhead of the pedestrians. The overhead recordings perpendicular to the floor allow a view without occlusion for a range of body heights, so that a microscopic detection and tracking without estimation of the persons‘ route can be performed.
There has already been done a lot of work in the field of pedestrian detection. Most of the approaches are for monocular cameras and slanted views like from surveillance cameras, and densities which result only in temporary or partly occlusion. One of the best results for these scenarios can be found in [Schwartz2009]. Existing techniques for trajectory extraction for stereo cameras, such as
[Harville2004], depend on accurate segmentation of foreground objects. For dense crowds such as in our experiments these methods are not be applicable or could only detect groups of people.
Our extraction method copes also with crowded scenes. It directly uses the perspective depth field, and does not use laborious plan-view statistics to speed up and simplify the extraction step. The depth field contains the distance to the camera for every pixel and is inversely proportional to the disparity map, which describes the pixel offset of both camera view fields of the stereo camera for every pixel.
The new method works now as follows. The identification of the people is done only using the shape of the top part of their body especially the head and shoulders. If we want to identify people only by their shape, a background subtraction has to be performed before to reduce the number of false positive detections. Pixels are part of the background and thus are ignored in the detection process, if the distance to a perspective depth field of the background is smaller than a threshold value. The perspective depth field of the background is set once with the scene deserted or is set to a cautiously adapted maximum distance during all frames.
To identify pedestrians by means of the depth field we determine directed isolines of the same distance to the camera at equidistant depth levels for the upper body part. In advance the depth field is adapted by replacing values covered by the background mask with the furthest value belonging to the foreground.
The isolines enclosing a minimum and maximum of pixel and with a small ratio between the length of the isoline and the enclosed area (to eliminate isolines with big dents) are approximated by ellipses. The ellipses allow an easier access to the global shape. By scanning the depth field from the head downwards a pyramid of ellipses for the upper body part of every person is build up. These measured peoples’ pyramidal ellipses stacks (PES) are matched against a variance of people models where the perspective view has to take into account. The PES are compared to ellipses stacks we generate of synthetic models by raytracing a virtual scene simulating the depth sensing of a stereo camera.
The PES resulting from a measured disparity map are unstable and thus the centre of the top most ellipse is not a good point for the centre of the head. To stabilize the procedure and thus to get smoother trajectories we utilize the distorted axis of the PES for a more settled centre. The smoothness of trajectories resulting from people detection with and without marker is compared in the paper, because it is important for further analysis (e.g. instant velocity) of the trajectory data.
A strict rejection of not proper fitting PES avoids false detections since it is not necessary for tracking to detect a person every frame. For tracking the pedestrians the robust pyramidal iterative Lucas Kanade feature tracker is used to join the same detected pedestrian in successive frames or bypass over frames where a specific pedestrian could not be located.
The tracking results exceed the results of all former methods for tracking of markerless pedestrians. Besides this, the markerless detection can improve the robustness of the marker based detection, as detected marker not lying on an elevation describing a persons’ head can be rejected.