I recently got accepted onto the SQL Server Data Services (SSDS) beta program and have been getting to grips with the how, what and why of the service. For those that don't know SSDS is a cloud computing platform from Microsoft that allows us to store data in the cloud rather than host it on our own database servers. The data is available over HTTP via RESTful or SOAP protocols.
During a recent discussion on the SSDS forum Tony Petrossian who works on the SSDS team proclaimed:
If a query has a result set smaller or equal to 500 then you get all the results of the query in one page. If a query has more than 500 entities in the result set then you need to run the query multiple times to get all the results (500 at a time).
Interesting. That may come as a shock to those of us familiar with querying a database; we are used to asking for a resultset and getting all of it back in one fell swoop. What it DOES do is underline the fact that data services are a different paradigm than databases and we shouldn't make assumptions on how data services function based on behaviour that we are already used to.
A number of questions arise once you know about this artificial limit:
-
Firstly,
Why is the result set capped at all? Its fairly simple really. Its not a good idea to saturate yours and other people's internet connection by pulling back all results at once, especially when the results are delivered as verbose XML. If your resultset has millions of records in it then that's a lot of data to be pulled across an already
creaking internet backbone.
-
Am I now unable to get at records that don't get returned in the first 500? Fear not, you're not being restricted from accessing your data. The SSDS team have effectively implemented a paging algorithm that chunks the data up into into 500-record pages. Tony has posted
some code that demonstrates how to iterate over these pages and return all the data.
-
Using that code am I not going to end up making requests for data when there isn't any to be returned? Its true that if you use Tony's code then its possible that the very last request won't contain any data but given that this can only occur when the number of results is exactly divisible by 500 then is it really a problem? How often are you going to be returning more than 500 results anyway? There was general consensus on the
discusson thread that it would be nice to know if the resultset had been capped or not so its possible that that feature may be added later in the beta cycle. Let me know what you think about this one.
-
Is it possible that I will miss records that get created in between the first and last page request? Yes, this is definitely possible. Do you think there are scenarios in which this becomes a problem? Remember that paging algorithms have been in use for years and its possible to miss newly created records when using those as well or, probably even worse, its possible that you get some records more than once (depends on your paging algorithm).
-
Will the value of 500 be configurable? I asked this on the
discussion thread and at the time of writing haven't received an answer.
More to come on SSDS in the future.
-Jamie