DigitallyCreated

Blog

Modifying Azure Table Storage Data using FSharp.Azure

May 08, 2014 1:23 PM by Daniel Chambers (last modified on May 24, 2014 9:53 AM)

In my previous post I gave a quick taster of how to modify data in Azure table storage using FSharp.Azure, but I didn’t go into detail. FSharp.Azure is the new F# library that I’ve recently released that lets you talk to Azure table storage using an idiomatic F# API surface. In this post, we’re going to go into deep detail about all the features FSharp.Azure provides for modifying data in table storage.

Getting Started

To use FSharp.Azure, install the NuGet package: FSharp.Azure. ~~At the time of writing the package is marked as beta, so you will need to include pre-releases by using the checkbox on the UI, or using the –Pre flag on the console.~~ (v1.0.0 has been released!)

Once you’ve installed the package, you need to open the TableStorage module to use the table storage functions:

open DigitallyCreated.FSharp.Azure.TableStorage

Compatible Types

In order to provide an idiomatic F# experience when talking to Azure table storage, FSharp.Azure supports the use of record types. For example, this is a record you could store in table storage:

type Game = 
    { Name : string
      Developer : string
      HasMultiplayer : bool
      Notes : string }

Note that the record fields must be of types that Azure table storage supports; that is:

string
int
int64
bool
double
Guid
DateTimeOffset
byte[]

In addition to record types, you can also use classes that implement the standard Microsoft.WindowsAzure.Storage.Table.ITableEntity interface.

For the remainder of this post however, we will focus on using record types.

Specifying the Partition Key and Row Key

One of the design goals of the FSharp.Azure API is to ensure that your record types are persistence independent. This is unlike the standard ITableEntity interface, which forces you to implement the PartitionKey and RowKey properties. (And therefore if you're using that interface, you don't need to do any of things in this section.)

However, FSharp.Azure still needs to be able to derive a Partition Key and Row Key from your record type in order to be able to insert it (etc) into table storage. There are three ways of setting this up:

Attributes

You can use attributes to specify which of your record fields are the PartitionKey and RowKey fields. Here's an example:

type Game = 
    { [<RowKey>] Name : string
      [<PartitionKey>] Developer : string
      HasMultiplayer : bool
      Notes : string }

The IEntityIdentifiable interface

Sometimes you need to be able to have more control over the values of the Partition Key and Row Key. For example, if we add a Platform field to the Game record type, we will need to change the RowKey, or else we would be unable to store two Games with the same Name and Developer, but different Platforms.

To cope with this situation, you can implement an interface on the record type:

type Game = 
    { Name: string
      Developer : string
      Platform: string
      HasMultiplayer : bool
      Notes : string }
    interface IEntityIdentifiable with
        member g.GetIdentifier() = 
            { PartitionKey = g.Developer; RowKey = sprintf "%s-%s" g.Name g.Platform }

In the above example, we've derived the Row Key from both the Name and Platform fields.

Replace EntityIdentiferReader.GetIdentifier with your own function

For those purists who don't want to dirty their types with interfaces and attributes, there is the option of replacing a statically stored function with a different implementation. For example:

let getGameIdentifier g = 
    { PartitionKey = g.Developer; RowKey = sprintf "%s-%s" g.Name g.Platform }

EntityIdentiferReader.GetIdentifier <- getGameIdentifier

The type of GetIdentifier is:

'T -> EntityIdentifier

Setting up

The first thing to do is define a helper function inGameTable that will allow us to persist records to table storage into an existing table called "Games".

open Microsoft.WindowsAzure.Storage
open Microsoft.WindowsAzure.Storage.Table

let account = CloudStorageAccount.Parse "UseDevelopmentStorage=true;" //Or your connection string here
let tableClient = account.CreateCloudTableClient()

let inGameTable game = inTable tableClient "Games" game

This technique of taking a library function and fixing the tableClient and table name parameters is very common when using FSharp.Azure's API, and you can do it to other similar library functions.

Operations

FSharp.Azure supports all the different Azure table storage modification operations and describes them in the Operation discriminated union:

type Operation<'T> =
    | Insert          of entity : 'T
    | InsertOrMerge   of entity : 'T
    | InsertOrReplace of entity : 'T
    | Replace         of entity : 'T * etag : string
    | ForceReplace    of entity : 'T
    | Merge           of entity : 'T * etag : string
    | ForceMerge      of entity : 'T
    | Delete          of entity : 'T * etag : string
    | ForceDelete     of entity : 'T

The Operation discriminated union is used to wrap your record instance and describes the modification operation, but doesn't actually perform it. You act upon the Operation by passing it to our inGameTable helper function (which calls the inTable library function). See below for examples for all the different types of operations.

Inserting

In order to insert a row into table storage we wrap our record using Insert and pass it to our helper function, like so:

let game = 
    { Name = "Halo 4"
      Platform = "Xbox 360"
      Developer = "343 Industries"
      HasMultiplayer = true
      Notes = "Finished the game in Legendary difficulty." }

let result = game |> Insert |> inGameTable

result is of type OperationResult:

type OperationResult = 
    { HttpStatusCode : int 
      Etag : string }

The other variations of Insert (InsertOrMerge and InsertOrReplace) can be used in a similar fashion:

let result = game |> InsertOrMerge |> inGameTable
let result = game |> InsertOrReplace |> inGameTable

Replacing

Replacing a record in table storage can be done similarly to inserting, with one caveat. Azure table storage provides optimistic concurrency protection using etags, so when replacing an existing record you also need to pass the etag that matches the row in table storage. For example:

let game = 
    { Name = "Halo 4"
      Platform = "Xbox 360"
      Developer = "343 Industries"
      HasMultiplayer = true
      Notes = "Finished the game in Legendary difficulty." }

let originalResult = game |> Insert |> inGameTable

let gameChanged = 
    { game with
        Notes = "Finished the game in Legendary and Heroic difficulty." }

let result = (gameChanged, originalResult.Etag) |> Replace |> inGameTable

If you want to bypass the optimistic concurrency protection and just replace the row anyway, you can use ForceReplace instead of Replace:

let result = gameChanged |> ForceReplace |> inGameTable

Merging

Merging is handled similarly to replacing, in that it requires the use of an etag. Merging can be used when you want to modify a subset of properties on a row in table storage, or a different set of properties on the same row, without affecting the other existing properties on the row.

As a demonstration, we'll define a new GameSummary record that omits the Notes field, so we can update the row without touching the Notes property at all.

type GameSummary = 
    { Name : string
      Developer : string
      Platform : string
      HasMultiplayer : bool }
    interface IEntityIdentifiable with
        member g.GetIdentifier() = 
            { PartitionKey = g.Developer; RowKey = sprintf "%s-%s" g.Name g.Platform }

Now we'll use Merge to update an inserted row:

let game = 
    { Name = "Halo 4"
      Platform = "Xbox 360"
      Developer = "343 Industries"
      HasMultiplayer = true
      Notes = "Finished the game in Legendary difficulty." }

let originalResult = game |> Insert |> inGameTable

let gameSummary = 
    { GameSummary.Name = game.Name
      Platform = game.Platform
      Developer = game.Developer
      HasMultiplayer = false } //Change HasMultiplayer

let result = (gameSummary, originalResult.Etag) |> Merge |> inGameTable

Like Replace, Merge has a ForceMerge variant that ignores the optimistic concurrency protection:

let result = gameSummary |> ForceMerge |> inGameTable

Deleting

Deleting is handled similarly to Replace and Merge and requires an etag.

let game = 
    { Name = "Halo 4"
      Platform = "Xbox 360"
      Developer = "343 Industries"
      HasMultiplayer = true
      Notes = "Finished the game in Legendary difficulty." }

let originalResult = game |> Insert |> inGameTable

let result = (game, originalResult.Etag) |> Delete |> inGameTable

A ForceDelete variant exists for deleting even if the row has changed:

let result = game |> ForceDelete |> inGameTable

Often you want to be delete a row without actually loading it first. You can do this easily by using the EntityIdentifier record type which just lets you specify the Partition Key and Row Key of the row you want to delete:

let result = 
    { EntityIdentifier.PartitionKey = "343 Industries"; RowKey = "Halo 4-Xbox 360" } 
    |> ForceDelete 
    |> inGameTable

Asynchronous Support

The inGameTable helper function we've been using uses the inTable library function, which means that operations are processed synchronously when inTable is called. Sometimes you want to be able to process operations asynchronously.

To do this we'll define a new helper function that will use inTableAsync instead:

let inGameTableAsync game = inTableAsync tableClient "Games" game

Then we can use that in a similar fashion:

let game = 
    { Name = "Halo 4"
      Platform = "Xbox 360"
      Developer = "343 Industries"
      HasMultiplayer = true
      Notes = "Finished the game in Legendary difficulty." }

let result = game |> Insert |> inGameTableAsync |> Async.RunSynchronously

One obvious advantage of asynchrony is that we can very easily start performing operations in parallel. Here's an example where we insert two records in parallel:

let games = 
    [
        { Name = "Halo 4"
          Platform = "Xbox 360"
          Developer = "343 Industries"
          HasMultiplayer = true
          Notes = "Finished the game in Legendary difficulty." }

        { Name = "Halo 5"
          Platform = "Xbox One"
          Developer = "343 Industries"
          HasMultiplayer = true
          Notes = "Haven't played yet." }
    ]

let results = 
    games 
    |> Seq.map (Insert >> inGameTableAsync) 
    |> Async.Parallel
    |> Async.RunSynchronously

Batching

Azure table storage provides the ability to take multiple operations and submit them to be processed all together in one go. There are many reasons why you might want to batch up operations, such as

Reducing cost - since you are billed on a per transaction basis, batching reduces your number of transactions
Performance - instead of performing multiple HTTP requests you can batch them together into one (or more) batch requests

However, there are some restrictions on what can go into a batch. They are:

All the operations in a single batch must deal with rows in the same partition in the same table
You cannot perform multiple operations on a single row in the same batch
There can be no more than 100 operations in a single batch

FSharp.Azure provides functions to make batching easy. First we'll define a batching helper function:

let inGameTableAsBatch game = inTableAsBatch tableClient "Games" game

Now let's generate 150 Halo games and 50 Portal games, batch them up and insert them into table storage:

let games = 
    [seq { for i in 1 .. 50 -> 
            { Developer = "Valve"; Name = sprintf "Portal %i" i; Platform = "PC"; HasMultiplayer = true; Notes = "" } };
     seq { for i in 1 .. 150 -> 
            { Developer = "343 Industries"; Name = sprintf "Halo %i" i; Platform = "Xbox One"; HasMultiplayer = true; Notes = "" } }]
    |> Seq.concat
    |> Seq.toList

let results = 
    games 
    |> Seq.map Insert 
    |> autobatch 
    |> List.map inGameTableAsBatch

The autobatch function splits the games by Partition Key and then into groups of 100. This means we will have created three batches, one with 50 Portals, one with 100 Halos, and another with the final 50 Halo games. Each batch is then sequentially submitted to table storage.

If we wanted to do this asynchronously and in parallel, we could first define another helper function:

let inGameTableAsBatchAsync game = inTableAsBatchAsync tableClient "Games" game

Then use it:

let results = 
    games 
    |> Seq.map Insert 
    |> autobatch 
    |> List.map inGameTableAsBatchAsync
    |> Async.Parallel
    |> Async.RunSynchronously

Conclusion

In this post, we’ve gone into gory detail about how to modify data in Azure table storage using FSharp.Azure. In a future post, I’ll do a similar deep dive into the opposite side: how to query data from table storage.