Azure products were made to mesh together quite well, but WordPress integration isn’t much more work. This article will provide a brief overview of how I scraped web data into Azure and then surfaced that data via WordPress during some learning and experimentation.
The high-level architecture is as follows:
- The data was crawled using timed Logic App requests to pull the website/data source
- The Logic App parses and pushes this data into Azure Table Storage
- WordPress queries the Azure Table Storage via Azure Storage Table PHP Client Library
- The data is then charted via a custom page using free JavaScript charting software
Important: This is a quick and dirty experiment for learning purposes. architecture may vary for your scenario and the code displayed here is not necessarily optimal. There are many approaches, one possible one would be to leverage Azure Time Series Insights for example.
Identifying Data Sources
The very first order of business is to identify what data you wish to collect. Where does it reside? How is the data structured? Is it in a predictable location?
In my example, I looked at several local hospitals in which each had an emergency room wait time webpage open to the public. Wanting to aggregate this data I dug to see the root source of this data.

I found a call from a back-end source that provided all the information I needed via a JSON file. This made it simple. Keep in mind that HTML data is also workable, provided it is consistent with where the data resides within the DOM.
Identify your source data URLs and where, in the response, you want to parse values from. This will be used in the next section.
Creating Data Store
An Azure Storage V2 account is spun up so that I can leverage table storage, this is a lightweight service for storing NoSQL data. There is no need for relational data and it’s helpful to have a schemaless design in case we want to expand on this data.
When working with table storage, it’s helpful to get familiar with partitioning strategies to scale your data. In this case, I opt for the following design:
- PartitionKey: Unique ID of the source data (in my case, the hospital)
- RowKey: Time-based key based on the “lastUpdate” property to help ensure that no duplication of data would exist.
- MinutesWait: I wanted to capture the wait time at the last refresh from their underlying service
Below shows the table populated with data after crawling, which is discussed below:

Note: A Shared Access Signature was created. This is what can be used to authorize WordPress to connect to this data.
Crawling Data
Logic apps are great for creating quick, no-code automated workflows and integrate seamlessly with storage services such as Azure Table Storage so I opted for this route.
Crawling the data in its simplest form takes very little:
- An HTTP request to the URL to get the data. There is an option to set how often to query the data here.
- A parse activity to grab what is needed. For HTML this is doable by swapping this step for something such as a “Set variable” action and using string functions to parse the value out. This can be tricky without RegEx but it’s possible! Otherwise, you may look into creating an Azure Function, then use that for more advanced parsing.
- Insert the entity to the Azure Table Storage

Lifting Data into WordPress
I found Azure Storage Table PHP Client Library the easiest way to get up and running. It took some work but once the library is installed I was able to pull the data in using the SAS created earlier. Below are some useful links when dealing with this library:
I then created a new WordPress template page and used the following code to provide the ability for logic later in the page to get data for a specific hospital.
<?php /* Template Name: App - Hospital Wait Times */
require_once "$/internal/autoload.php"; //Composer
use MicrosoftAzure\Storage\Table\TableRestProxy;
use MicrosoftAzure\Storage\Common\Exceptions\ServiceException;
use MicrosoftAzure\Storage\Table\Models\QueryEntitiesOptions;
use MicrosoftAzure\Storage\Table\Models\Filters\Filter;
use MicrosoftAzure\Storage\Table\Models\Entity;
use MicrosoftAzure\Storage\Table\Models\EdmType;
$connectionString = "SAS CONNECTION STRING HERE...";
$tableClient = TableRestProxy::createTableService($connectionString);
$mytable = 'hospData';
function queryAllEntitiesInPartition($tableClient, $mytable)
{
$lastYear = gmdate("Y-m-d", strtotime("-60 days"));
$filter = "Timestamp ge datetime'$lastYear'";
$result = $tableClient->queryEntities($mytable, $filter);
$entities = $result->getEntities();
$nextPartitionKey = $result->getNextPartitionKey();
$nextRowKey = $result->getNextRowKey();
while (!is_null($nextRowKey) && !is_null($nextPartitionKey)) {
$options = new QueryEntitiesOptions();
$options->setNextPartitionKey($nextPartitionKey);
$options->setNextRowKey($nextRowKey);
$options->setFilter(Filter::applyQueryString($filter));
$result2 = $tableClient->queryEntities($mytable, $options);
$newentities = $result2->getEntities();
$nextPartitionKey = $result2->getNextPartitionKey();
$nextRowKey = $result2->getNextRowKey();
$entities = array_merge($newentities, $entities);
}
$timeResults = array();
try {
foreach ($entities as $entity) {
$timeResult = new stdClass();
$timeResult->timestamp = $entity->getRowKey();
$timeResult->minutes = $entity->getProperty("MinutesWait")->getValue();
$timeResult->partition = $entity->getPartitionKey();
array_push($timeResults, $timeResult);
}
} catch (Exception $e) {
echo "Error loading....";
$timeResults = $e;
}
return $timeResults;
}
get_header();
?>
Rendering Data
Any charting software can work here, however, I found for handling inconsistent timed data AMCharts worked wonders. One thing to keep in mind is where to process the data, server-side with caching may often make the most sense. For my example I let JS do the lifting, to get it done quickly, by simply dumping the deserialized data right into the JS.


The crude way to get the PHP data into JavaScript’s hands:
var data = JSON.parse('<?php echo json_encode(queryAllEntitiesInPartition($tableClient, $mytable)); ?>');
At this point, you have your data and you just need to form it to whatever chart you are planning on using. For parsing epoch dates and UTC offsets I used date-fns and the following code:
const TICK_MAX_DATE_VAL = 3155378975999999999;
const EPOCH_MICROTIME_DIFF = 62135579038000; //Math.abs(new Date(0, 0, 1).setFullYear(1));
var tickToDate = (ticks, hourAdj) => {
var tick_adj = TICK_MAX_DATE_VAL - ticks;
var tick_date = new Date(tick_adj / 10000 - EPOCH_MICROTIME_DIFF);
tick_date = dateFns.addHours(tick_date, hourAdj);
return tick_date;
};
Final Thoughts
From scraping data to integrating and displaying data, Microsoft Azure and WordPress can work well together and in a short time. This article is meant for educational purposes only. If you are planning on scraping data and releasing publicly ensure that you are not violating copyright or terms of service.
No Comments
There are no comments related to this article.
Leave a Reply