Executive Summary
The many Android applications (referred to as “apps” throughout this study) that utilize devices’ location acquisition capabilities, based on a sample set detailed in this report (q.v., below) share a forensically significant feature: they generate logs/database files with varying degrees of precision and volume that tend towards high granularity and trackability. Of the apps utilized in this research, all but one stored extensive location data (cell/base station IDs, location area codes, latitude/longitude pairs, etc.); in each of these cases the primary mechanism for storage of these data was a SQLite database file (or several).
By acquiring these forensically valuable artifacts of geolocation and cell (base station)-enumeration activities from a mobile device running an Android operating system (OS) an investigator can recreate histories, itineraries, and paths of travel of the device (and thus its owner) with a fair degree of confidence; in some cases it may be possible to recreate these activities with a high degree of confidence. Although the data generated for this study were mostly derived from apps designed to record a mobile station’s (MS) interactions with cellular base transceiver stations (“towers”) and other network nodes, apps designed only to retrieve local weather statistics, for example, were also found to store forensically significant location data.
The present study demonstrates that many Android apps—certainly all of those tested in this study—that utilize and collect geographical coordinates via GPS and cell/base station data (CIDs, etc.) can be used to recreate a user’s travel habits and the areas in which he or she spends the most time, as well as specifically when (down to the second or millisecond) that user was in a given location. This suggestion is based on the finding that, in the case of each app utilized for this study, this user location/travel mapping was possible and easily achievable.
Goals
The primary goal of this project was to determine the extent of both geographical coordinate collection (via GPS) and mobile telephony system data (e.g., cell identification codes for base stations) that a sample of twenty apps generated. Of these apps, only five are presented in any detail throughout this report in the interest of preserving space and the patience of the reader. Ultimately, the scope of the analysis itself will be limited to the data generated by one of these apps (Tower Collector), as this program generated the most measurements and the most relevant data overall. The original scope covered about ten of these apps but this scope was found to be far too broad for the aims of this study.
The following sections will detail the hardware and software used in this project, how these were used to generate data for analysis, the acquisition of these data from an Android device (EXT4 filesystem), the analysis of these data itself, findings, and a concluding section.
Software & Hardware Utilized
Hardware Utilized
At the outset of this project it was determined that an Android device was needed that could be rooted but that would also be capable of running the appropriate applications (i.e., a fairly current device). After all, some Android devices have bootloaders that simply cannot be unlocked, like the Samsung S7 Qualcomm variants (e.g., the G930P), which is a lesson that this investigator learned from many, many wasted hours of flashing files with Odin. An affordable solution was found: a used Essential PH-1 device with stock Android 10 and the mata bootloader, which allows a simple fastboot oem unlock and fastboot flashing unlock. Essential was a short-lived company and smartphone make created by Andy Rubin, one of the co-founders of Android (Incorporated). The first and last model produced, the PH-1, is a mid-tier Android smartphone that bears some resemblances to the Google Pixel line, such as a quick update release cycle and “vanilla” Android. The successor to the PH-1, the PH-2, had been designed and was nearly ready for release but this was cancelled to the consternation of a small but enthusiastic fanbase. What made this device especially suitable for this project was that it, like the Pixel line, is (relatively) easy to root and also the fact that it would not be receiving any Android updates of any kind because none would be released by Essential (since the entire project had been shelved), making an accidental reversion to a non-rooted state much less likely (e.g., due to an accidental or forced OTA (over-the-air) update). Furthermore, this investigator found a good deal for a PH-1 in like-new condition.
Although a factory reset had apparently been performed by the previous owner, it was important that the device be in a known state prior to generating and recovering any data/artifacts from the filesystem. The process of rooting the device required several attempts before succeeding, which had the beneficial effect of ensuring that the device’s partitions were wiped multiple times (prior to each flashing operation). Since the purpose of this paper is not to provide instruction on rooting a PH-1 device the procedure will not be delineated here – the result is the important detail, i.e., that a so-called “system-less” root was achieved via the TWRP custom recovery and Magisk (with no extra modules installed/flashed).
A SIM card (i.e., a UICC) was obtained from Tracfone and a one-month activation paid for so that the project would reflect real conditions as closely as possible.
Software Utilized
Many apps that function primarily or largely via location data acquisition were used to generate data for this study, but those that generated the most relevant data and the most records (thus best suited to granular mapping) are presented below. Of the twenty different apps used for location data collection/generation in this study, the following five best represent the type of location data/artifacts recovered from the PH-1 Android device (q.v., below for details):
Network Cell Info v4.19.8 (M2Catalyst, LLC. [Wilysis])
This is the app that this investigator first used and the one that originally provoked interest in discovering what kind of forensically valuable artifacts are generated by apps that make heavy use of location data. A free “lite” version exists but the screen capture above shows the version used in this study. Like most of the apps featured in this section, Network Cell Info provided extremely detailed information about current and historical network connections, whether by WWAN/mobile, GPS, or Wi-Fi, including location area code (LAC) and Cell ID (CID), to specify just two data recorded by this app.
Tower Collector v2.2.4 (Adam Zamojski)
Tower Collector is aimed at users interested in contributing to the OpenCellID and Mozilla Location Services location database projects, both of which are accessible via a publicly available API. Although the present investigator did not (yet) upload the data generated by this app, Tower Collector was the most voluminous of data generators for this project, generating 4720 distinct records in the measurements and cell_signals tables of its measurements.db file.
The stats table of the same DB file shows the number of unique cells discovered, and the timestamp of the first record (Wed 26 February 2020 05:53:42.000 UTC):
Figure 1: Stats table of measurements database (WiGLE)
WiGLE Wi-Fi Wardriving v2.53 (WiGLE.net )
This app recorded prolific geographical/location data but also recorded non-mobile network details, such as Wi-Fi SSIDs and BSSIDs (i.e., MAC addresses) and even Bluetooth data. Like several other apps of its kind, it offers or partners with a database operated on the Web and accessible via API, though WiGLE focuses on Wi-Fi and other non-mobile network types (e.g., personal area networks via Bluetooth, etc.).
OpenSignal v6.6.1-1 (Opensignal.com)
OpenSignal also generated a great deal of usable data and, although a good candidate for further analysis, was ultimately not chosen as an exemplar of the data collected (as Tower Collector was). Like many others featured here, OpenSignal allows user contributions of measurements and other network statistics and also provides a useful Wi-Fi/mobile network upload/download/ping test.
Mozilla Stumbler v1.8.8 (Mozilla)
Mozilla Stumbler was unique in that it stored its data in compressed (Gzip) JSON files:
Figure 2: Mozilla Stumbler data stored in (un-)archived JSON file
In this case, it generated 27 “report” GZ files, each of which contained either 1 or 50 entries. The naming convention for these files follows this pattern: report- t1583<xxxxxxxxx>-r[50|1]-w[0..52]-c[49..54] (where r identifies the record count, i.e., 50 or 1). Extracting the extension-less JSON files from the archives produced the above screen capture. Despite storing its data differently, Mozilla Stumbler generated data very much like those generated by the other apps of its kind featured in this study. It also acts as an interface to Mozilla’s Location Services database and allows user contributions.
Following are the programs/utilities used for analysis of artifacts recovered from the image, with those used for acquisition of data further down.
Tools used for post-acquisition analysis of data:
CyberChef v9.20.3
MiTec JSON Viewer v1.5.0.0
SQLite Studio v3.2.1
DB Browser for SQLite v3.11.2
SQLiteMan v1.2.2
MiTec SQLite Query v3.0.0.0
SQLite Forensic Explorer (Trial) v2.0
Autopsy v4.14 & v4.15 [used for initial exploration of PH-1 dd image filesystem]
Google Maps (Web) [for mapping of coordinates]
Google Earth (Web) [for mapping of coordinates]
Tableau Desktop v2020.1.2 [for mapping of coordinates]
Tools used for acquisition of data from Essential PH-1:
Essential’s custom version of adb/fastboot: Android Debug Bridge version 1.0.39 Revision 3db08f2c6889-android; fastboot version 3db08f2c6889-android, which includes three DLL files
Busybox v1.30.1 (Stephen (Stericson)): source of necessary GNU/Unix tools for use in an adb shell, such as nc and dd
Magisk v20.3 (20300): for achieving “system-less” root state for necessary filesystem access (e.g., to run dd and nc (netcat)) and for granting superuser rights to apps
TWRP custom recovery: by TeamWin, the de facto standard for custom recoveries; this enabled flashing of images and thereby allowed Magisk to be flashed
Hashdeep64 v4.4: used to generate MD5/SHA256 hash values for the relevant DB file local copies for comparison against source copies; the same files on the source media were piped into md5sum and sha256sum while in an ADB shell to generate initial hashes
The great majority of the analytical work was done with the SQLite tools using the DB files recovered from the PH-1.
The following Web sites and APIs were used for mapping geographic coordinates:
CellMapper @ https://cellmapper.net/map (for verifying cell/tower data by MCC/MNC/LAC)
UnwiredLabs/OpenCellid LocationAPI @ https://unwiredlabs.com
Google Earth @ https://earth.google.com/
Generation of Data
The process of generating the data to be analyzed for this project was simple: install the requisite apps via Google Play, F-Droid, etc., grant permissions, then collect GPS/tower data while driving along normal routes that could easily be recognized by the investigator (in order to confirm the accuracy of the data). In this case, the routes were essentially eastern/western travel between Northern Virginia and Warren County, VA, with occasional but minor deviations from this route.
Although the primary and overwhelmingly favored data storage format used by these apps is SQLite DB (or SQLITE or no extension) files, many apps generated supplementary data in XML, JSON, or simple text files (usually with the LOG extension). Examples of this, aside from Mozilla Stumbler’s use of archived/compressed JSON files, are Cellular-Z’s use of CSV files for storage of data points (date, lat/long, type (e.g., WCDMA), LAC, CID, etc.) and WiGLE’s use of a CSV for storing additional and redundant (to the DB files used by the same app) information about Wi-Fi and Bluetooth connections/activity. Cellular-Z also stored ad-hoc network and location details , when requested by the user, to a text file named “Cellular-Z yyyymmdd.txt,” which included “cell” details (CID, LAC, type, etc.) and others such as ICCID, MCC, and MNC. Incidentally, Cellular-Z was unique in that its APK archive generated a significant number of positive hits at VirusTotal (8 out of 72 or so). It would have been interesting and instructive, but irrelevant to this project, to perform an analysis of the package and code, but this will be the topic of another project, perhaps. Other than the RECEIVE_BOOT_COMPLETED permission (which most of the five apps featured in this report also requested), there was no immediately suspicious permission listed in packages.xml for this app – for what this is worth.
OpenSignal also used such supplementary files, such as the composite_measurements.xml file, which contained a Base64 signature and specified an “artifact” schema, apparently as a model for its database records (though this is just a semi-educated guess). This XML file also featured two of the permissions requested by the app and the datatypes (text, int, etc.) required for storing information requested after these permissions were granted. Additionally, Network Cell Info stored general version and configuration data in a JSON file named currentInstallation, including such data as the version of SDK present, application name, last application start time/date (2020-05-09T16:50:10.585Z [Zulu == GMT]), locale (“en-US”), and creation time.
Figure 3: Example of GUI view of cell towers/BTSs (Network Monitor)
Acquisition of Data
Prior to logical acquisitions by means of, for the most part, adb (Essential’s version), the first task was to create a physical image of the filesystem. After identifying the block device name (sda) and the specific partition number for the userdata partition (sda16), the userdata partition was imaged by dd over netcat. The specific Android-side command used to initiate the image and send it over port 9999 to netcat on a Windows terminal was the following:
dd if=/dev/block/sda16 | nc -l -p 9999
Figure 4: Saving image from localhost:9999 (Windows; right) and sending image (Android/PH-1; left)
Figure 5: Initiating imaging over netcat at PH-1 shell
Figure 6: Redirecting input received over localhost via port 9999 into IMG file
This operation resulted in the following:
243843072+0 records in
243843072+0 records out
124847652864 bytes (116.3GB) copied, 3570.501618 seconds, 33.3MB/s
Figure 7: Identifying partitions by displaying contents of /proc/partitions (sda16 == data partition)
For redundancy and possible ease of use later in the analysis stage the investigator also collected all DB files directly via his personal favorite of the GNU tools: find. Running find from the /data/data directory of the PH-1 and passing its iterative results to ls a comprehensive list of appropriate DB files was compiled. The same command was repeated with cp -R {} replacing the ls command so that these files would be sent to a location at /sdcard for easy retrieval after exiting the shell. This broad selection of DB files was, simply put, a backup.
Figure 8: Iteratively searching for all DB files from userdata partition (sda16) data directory
Some specific database files were occasionally pulled via adb pull operations or via a root shell, but the bulk of the analysis was done on the data acquired by imaging with dd.
Analysis of Data
The records in all of the recovered database files contained a timestamp field with an entry recorded in Unix timestamp format (milliseconds), such as 1582696734114 (Wed 26 February 2020 05:58:54.114 UTC). The Tower Collector app was among the first used during the data generation stage and, since it generated the most usable data, it will serve as the artifact exemplar for the analysis presented in this section as well as for the findings proffered in the following section.
After the project data had been generated by the sample of Android apps used for this project (and with the combustion of many gallons of fuel), this investigator initially began exploring the image (of sda16) with major tools such as Axiom Examine (via Axiom Process) (trial license), Belkasoft Evidence Center (trial license), and Autopsy v4.14 (updated to v.4.15 mid-project), but it soon became clear that these powerful tools were simply not necessary. Such supererogatory analysis was a distraction because of the investigator’s zeal to explore his first Android image with tools that are normally beyond his reach (with the exception of Autopsy) – it was too easy to lose sight of the actual goal and pursue objectives extraneous to those of the project. Virtually all of the actual analytical work was done with the various SQLite and command-line tools enumerated in the Software/Hardware section above.
Mapping Towers with OpenCellid
The apps that allowed the user to view detailed cell tower details included the following critical data in their SQLite database files (or CSV/JSON, rarely): MCC, MNC, LAC, CID. These four values together hierarchically make up what is called the Cell Global Identity (CGI), which, in some mobile networks, uniquely describes a specific base transceiver station (BTS); other types of mobile networks concatenate similar values in an analogous way (Salem, 2018a, 2018b). A BTS is the radio device that a mobile station (i.e., the Android telephony-capable device) initially connects to when establishing communication with the mobile network and, as such, represents the lowest-level of granularity in the measurements made by these apps (with a location area code being a higher level of measurement/coverage) (Bair, 2018, p. 128). Some of these mobile network architecture values, necessary to successful communication between mobile stations and the networks, are stored in SIMs (i.e., UICCs), such as the location area identity (LAI) code, when such cards are used by the device (as is increasingly the case) (Bair, 2018, p. 144).
Using these values, accordingly, it is possible to determine the location of a “tower” contacted at a specific time (discovered_at) by providing these four values to a service such as OpenCellid via its API. Each value covers a broader area, with the mobile country code (MCC) being the broadest, the mobile network code (MNC) being very broad, the location area code (LAC) representing a set of base stations and therefore a fairly restricted area, and the cell identification (CID) representing an individual base station (i.e., the narrowest level of coverage) (Salem, 2018b).
The first effective record [the first three actual records contained apparently invalid MCCs and other values, such as an MCC value of 2147483647, so they could not be mapped] in the cells table of the measurements.db file (Tower Collector) appears as:
This maps to a tower in Fauquier County, Virginia, according to the results of an OpenCellid API request, and occurred on/at Wed 26 February 2020 06:43:33.731 UTC:
The final GSM record in the cells table appears as:
These values identify a base station in the Winchester, Virginia area that was contacted on/at Sat 14 March 2020 19:27:48.297 UTC, according to OpenCellid:
Since these two terminal points suggest a round trip, perhaps a record from the middle of the table will reveal a tower in a different area. One of the records at location area code (LAC) 4615 appears as:
This maps to a tower in the vicinity of George Mason University’s Fairfax, VA campus, on/at Mon 2 March 2020 23:24:11.484 UTC:
This maps to a tower in the vicinity of George Mason University’s Fairfax, VA campus, on/at Mon 2 March 2020 23:24:11.484 UTC:
Permissions
Although the present author initially considered conducting static analyses of the executable files associated with each geolocation data-generating tool’s APK file (e.g., using APKTool), it became clear that any findings derived from such an analytical foray, while educational and interesting, would not be relevant enough to the goals of this project to warrant their inclusion. The packages.xml file from /data/system/ of the PH-1 image provided some insight into how the apps ran, or at least into how they could run.
Figure 9: Locating packages.xml file for recoveryFigure 9 Locating packages.xml file for recovery
Figure 10: Network Cell Info permissions request screen at first launch
Following are the specific permissions requested, according to each package’s manifest, by some of the apps that generated the most data and artifacts:
Network Cell Info:
<perms>
<item name="android.permission.RECEIVE_BOOT_COMPLETED" granted="true" flags="0" />
<item name="android.permission.INTERNET" granted="true" flags="0" />
<item name="com.android.vending.CHECK_LICENSE" granted="true" flags="0" />
<item name="com.android.vending.BILLING" granted="true" flags="0" />
<item name="android.permission.CHANGE_WIFI_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_WIFI_STATE" granted="true" flags="0" />
</perms>
OpenSignal:
<perms>
<item name="com.google.android.finsky.permission.BIND_GET_INSTALL_REFERRER_SERVICE" granted="true" flags="0" />
<item name="com.google.android.c2dm.permission.RECEIVE" granted="true" flags="0" />
<item name="android.permission.RECEIVE_BOOT_COMPLETED" granted="true" flags="0" />
<item name="android.permission.INTERNET" granted="true" flags="0" />
<item name="android.permission.CHANGE_WIFI_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_WIFI_STATE" granted="true" flags="0" />
<item name="android.permission.WAKE_LOCK" granted="true" flags="0" />
</perms>
WiGLE:
<perms>
<item name="com.google.android.providers.gsf.permission.READ_GSERVICES" granted="true" flags="0" />
<item name="android.permission.CHANGE_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.FOREGROUND_SERVICE" granted="true" flags="0" />
<item name="android.permission.RECEIVE_BOOT_COMPLETED" granted="true" flags="0" />
<item name="android.permission.BLUETOOTH" granted="true" flags="0" />
<item name="android.permission.INTERNET" granted="true" flags="0" />
<item name="android.permission.BLUETOOTH_ADMIN" granted="true" flags="0" />
<item name="android.permission.CHANGE_WIFI_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_WIFI_STATE" granted="true" flags="0" />
<item name="net.wigle.wigleandroid.permission.MAPS_RECEIVE" granted="true" flags="0" />
<item name="android.permission.WAKE_LOCK" granted="true" flags="0" />
</perms>
Mozilla Stumbler:
<perms>
<item name="android.permission.CHANGE_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.INTERNET" granted="true" flags="0" />
<item name="android.permission.CHANGE_WIFI_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.ACCESS_WIFI_STATE" granted="true" flags="0" />
<item name="android.permission.WAKE_LOCK" granted="true" flags="0" />
</perms>
Tower Collector:
<perms>
<item name="android.permission.FOREGROUND_SERVICE" granted="true" flags="0" />
<item name="android.permission.RECEIVE_BOOT_COMPLETED" granted="true" flags="0" />
<item name="android.permission.INTERNET" granted="true" flags="0" />
<item name="android.permission.ACCESS_NETWORK_STATE" granted="true" flags="0" />
<item name="android.permission.WAKE_LOCK" granted="true" flags="0" />
</perms>
It is apparent, from comparing the permissions requested by each of these apps, that some permissions are generally necessary to their operation and ability to function as intended, e.g., android.permission.ACCESS_NETWORK_STATE, while others, such as android.permission.CHANGE_NETWORK_STATE, are requested by some but not by others. This study represents a novice investigator’s first attempt at analysis of an Android image, so the reader likely understands the implications of these permissions better than the author.
Findings
As suggested above and throughout, most of the apps sampled for this study produced enough location/cell data/records to allow an investigator to chart the device’s (i.e., user’s) movement through geographic points along a map, as well as to identify which “towers,” LACs, etc. were involved in the active connection at the time of measurement. In the interest of space and time, this section will present such a mapping that has been done with the data extracted from the artifacts created by Tower Collector. Because Tower Collector generated the most data (cell IDs, latitude/longitude pairs, etc.)—far more than any other app sampled—the mappings presented below best represent the path taken by the investigator while actively using that app over a period ranging from February 26, 2020 to March 14, 2020 (see previous section). The route was nearly identical for each of the sampled apps, with some simultaneously recording GPS/network data. Those apps that were tested at different times may have contacted different towers depending on environmental and other conditions, even with the same route followed as for other apps’ data generation runs, but that is beyond the present scope.
The first mapping shows the path comprising the thousands of discrete measurements by Tower Collector, with density indicated by red, yellow, and green shades:
Figure 11: Density map created with Tableau Desktop
At a glance one can see that there are two high-density areas, with an extended medium-density (yellow-green) trail between the two areas. In this case, what is meant by “density” is the relative concentration of measurements in space over time. Charted in this way, the data clearly suggest that the owner of the device spent significant amounts of (relative) time in the westernmost high-density area and in the easternmost high-density area, with a significant amount of movement between the two points (the yellowish path between the red termini).
Using Google Earth (Web version), the same set of coordinates was mapped to provide a lower-level view of the travel habits of the owner of the device (i.e., the author of this report). The heat map generated with Tableau Desktop provides a high-level overview of these data that is then complemented by the lower-level view provided by the Google Earth mapping. Together, as will become evident, such mappings are pregnant with significance and could be valuable in an actual investigation.
The following three screen captures represent, respectively: the eastern high-density area in the heat map, the western high-density area, and finally the westernmost point where measurements were made. In the third screen capture below, the same crook-shaped terminus is visible as is seen in the heat map above. In this view, however, one can see that the device owner appears to have taken different roads or specific paths despite following the same broad path. In the case of this data set, these separate paths simply indicate westward and eastward traffic involving exits 13 and 18 of Route 66 in Virginia.
Figure 12: Detail of the farthest east high-density (red) area
Figure 13: Detail of the farthest west high-density (red) area
[A map image in the original report is here redacted for privacy]
Conclusion
As demonstrated in the previous sections, an investigator can successfully extract and map both cell/tower and GPS data gathered by apps running on a modern Android smartphone, with the result being either a coarse location (if relying on the tower data) or a precise location (coordinates). These points can be charted and mapped along a path and correlated with their respective timestamps to create a clear picture of when, where, and whence the owner of an Android device has traveled. In the present case, the route taken was straightforward, as evident in the Tableau depiction (figure 8). A dataset of MCC/MNC/LAC/CID groups and/or coordinates covering a much larger span of time and more varied routes would likely yield even more investigatory value.
These tower/geographical data can be obtained from a device by an app that has been granted relatively few permissions, as is the case with Mozilla Stumbler, though, as in the case of Wigle (the app), many permissions can be requested and potentially granted for similar purposes.
References
Bair, J. (2018). Seeking the truth from mobile evidence: Basic fundamentals, intermediate and advanced overview of current mobile forensic investigations. Academic Press.
Salem, M. (2018a, October 3). Your Guide to GPRS network architecture—Mobile Packet Core Architecture. Mobile Packet Core. https://mobilepacketcore.com/gprs-network-architecture/
Salem, M. (2018b, October 5). PLMN, LAC, and RAC - GPRS network identifiers. Mobile Packet Core. https://mobilepacketcore.com/plmn-lac-rac-gprs/
Comments