DataLoaders and cache invalidation
Performance is a key factor in Front-Commerce projects. We aim to make your GraphQL server as fast as possible while ensuring that your data is always up-to-date.
This section introduces the DataLoader
pattern and how to use it in a
Front-Commerce application. It then details how to configure the caching layer
to keep remote API response in cache to boost performance. Finally, it explains
how to invalidate the cache from remote systems.
By the end of this guide, you will have a good understanding of the mechanisms at your disposal to compensate for the slowness of a remote API.
Why?
Imagine a resolver on a Product.qty
field that fetches the current quantity in
stock for a product from the https://inventory.example.com/stock/PRODUCT_SKU
remote API.
Let's see what happens when we run a query like this on our GraphQL endpoint:
{
category("pants") {
name
products({ limit: 10 }) {
sku
name
qty
}
}
}
The problem
This query would lead to 12 HTTP requests from the server to the remote datasource:
- 1 request to fetch category information
- 1 request to fetch products in the category, with sku and name (in the best case)
- 10 additional requests to fetch products
qty
field
Furthermore, the qty
requests will be started only after the previous category
response has been received, leading to network waterfalls which will delay the
GraphQL response.
This problem is also known as the N+1 problem and DataLoaders are a way to solve this using batching and caching.
Batching
Batching, in this context is the process of grouping every data that is required so that they could be retrieved in an efficient manner.
Let's suppose that our category contained 10 products PANT-01
, PANT-02
, …,
PANT-10
. Then, instead of making 10 HTTP requests for each product stock, we
could leverage batching to fetch all product inventories with a single remote
API call (ex:
https://inventory.example.com/stocks?skus=PANT-01,PANT-02,…,PANT-10
).
These kind of batch endpoints are not always available on remote services. But if they exist, they can avoid many remote API calls and lead to better performance.
Caching
Caching mechanisms are useful for GraphQL resolvers in two ways:
- to prevent re-fetching the same data across different queries (or for different users)
- to prevent re-fetching the same data twice during the same query resolution
The first use case is something one may already know from other systems. In the previous example, caching would allow to do the remote API calls when the first user visits the page, and retrieve this information from the cache for further visitors. The following GraphQL responses will then be faster and the remote system's load will decrease dramatically.
The second one is more specific to GraphQL. To understand its gain, we should consider the following query:
{
category("pants") {
name
products({ limit: 10 }) {
sku
name
qty
upsells({ limit: 2 }) {
sku
name
qty
}
}
}
}
If the product PANT-01
was an upsell of all other products in the pants
category, the inventory API would be requested again when resolving the
upsells.qty
field. Query-level caching prevents those extra-calls, by reusing
the response from products.qty
that has already been fetched previously to
resolve data.
Manually clearing the cache
Clearing the cache manually is possible on some of the supported platforms. Read dedicated pages for details:
What are DataLoaders?
DataLoader is a pattern promoted by Facebook, from their internal implementations, to solve problems with data fetching. We use this name because it is the name of the reference implementation in Javascript: graphql/dataloader.
A DataLoader is instantiated with a batching function, that will allow to fetch data in a grouped way (see Batching above). It also has a caching strategy that prevents fetching the same data twice in the same request or across requests (see Caching above).
By default every DataLoader provides request-level caching. But this can be configured to switch to a persistent caching strategy instead. For instance, Front-Commerce provides a Redis strategy to share the cache between users and requests.
In our previous example, if the Product.qty
resolver was implemented using a
DataLoader the query could have been resolved using only 2 remote API requests:
- 1 request to fetch category information
- 1 request to fetch products in the category
- 1 batch request to fetch quantities of the products if an API was available:
https://inventory.example.com/stocks?skus=PANT-01,PANT-02,…,PANT-10
We encourage you to read the DataLoader readme documentation to learn more about how it works.
Front-Commerce provides a factory function to create DataLoaders from your GraphQL modules while keeping caching strategies configurable. Under the hood it is a pure DataLoader instance, so you could use it in a standard manner.
Using DataLoaders in Front-Commerce
When building a GraphQL module, Front-Commerce will inject a makeDataLoader
factory function in your
module’s contextEnhancer
function.
makeDataLoader
usage
The makeDataLoader
factory allows developers to build a DataLoader without
worrying about the current store scope (in a multi-store environment) or caching
concern).
Here is an example based on the use case above:
import typeDefs from "./schema.gql";
import resolvers from "./resolvers";
import StockLoader from "./loader";
export default {
namespace: "Acme/Inventory",
typeDefs,
resolvers,
contextEnhancer: ({ makeDataLoader, req }) => {
const axiosInstance = axios.create({
baseURL: req.config.inventoryApiEndpointUrl,
});
return {
// create an instance of the loader, to be made available in resolvers
Stock: StockLoader(makeDataLoader, axiosInstance),
};
},
};
extend type Product {
qty: Int
}
export default {
Product: {
qty: ({ sku }, _, { loaders }) => {
// use the loader instance to fetch data
// batching and caching is transparent in the resolver
return loaders.Stock.loadBySku(sku);
},
},
};
import { reorderForIds } from "server/core/graphql/dataloaderHelpers";
const StockLoader = (makeDataLoader, axiosInstance) => {
// our batching function that will be injected in the DataLoader factory
// it is important to return results in the same order than the passed `skus`
// hence the use of `reorderForIds` (documented later in this page)
const loadStocksBatch = (skus) => {
return axiosInstance
.get("/stocks", { params: { skus } })
.then((response) => response.data.items)
.then(reorderForIds(skus, "sku"));
};
// The `Stock` key here must be unique across the project
// and is used in cache configuration to determine the caching strategy to use
const loader = makeDataLoader("Stock")((skus) => loadStocksBatch(skus));
return {
// `loader` is a standard DataLoader instance, so you can use any available methods on it
loadBySku: (sku) => loader.load(sku),
};
};
export default StockLoader;
The 2nd parameter to makeDataLoader
are the options to pass to the DataLoader
instance. You usually don't have to use it. Please refer to
dataloader's documentation
for further information.
Useful patterns
Prevent caching errors (data not found)
Batching functions will sometimes return null
or falsy data for nonexistent
items. By default, these values will be cached so further data retrieval could
return this null
value instead of doing a remote API call.
In some specific cases, you may want to force fetching data from the remote
source every time. You can do so by returning an Error
for the nonexistent
item.
Here is an example:
const fooLoader = makeDataLoader("AcmeFoo")((ids) =>
loadFooBatch(ids).then((items) =>
items.map((item) => {
if (!item) {
return new Error("not found");
}
return item;
})
)
);
Using a predefined TTL
In some contexts, cache invalidation could be impossible or difficult to implement in remote systems. You may still want to leverage Front-Commerce's caching features, such as the Redis persistent cache, to improve performance of your application.
The Redis strategy supports an additional option (to be provided during instantiation) that allows you to create a loader with a specified expiration time for cached items.
Here is how you could use it:
const fooLoader = makeDataLoader("AcmeFoo")(
(ids) => loadFooBatch(ids),
{ expire: EXPIRE_TIME_IN_SECONDS } // see https://github.com/DubFriend/redis-dataloader
);
Caching scalar values
DataLoaders mostly manipulate objects. Hence, it is safer to design your application to return objects from batching functions. This will ensure a wider range of caching strategies' compatibility (ex: Redis strategy does not support caching of scalar values).
const fooLoader = makeDataLoader("AcmeFoo")((ids) =>
loadFooBatch(ids).then((results) =>
results.map((result) => ({ value: result }))
)
);
// …
return fooLoader.load(id).then((data) => data.value);
Helpers available to create dataLoaders
Writing batching functions and loaders could lead to reusing the same patterns. We have extracted some utility functions to help you in this task.
You can find them in the
server/core/graphql/dataloaderHelpers
module.
reorderForIds
Batch functions must satisfy two constraints to be used in a DataLoader (from the graphql/dataloader documentation):
- The Array of values must be the same length as the Array of keys
- Each index in the Array of values must correspond to the same index in th Array of keys.
reorderForIds
will ensure that these constraints are satisfied.
Signature: const reorderForIds = (ids, idKey = "id") => data => sortedData;
It will sort data
by idKey
to match the order from the ids
array passed in
parameters. In case no matching values is found, it will return null
and log a
message so you could then understand why no result was found for a given id.
Example:
// skus will very likely be a param of your batch loader
const skus = ["P01", "P02", "P03"];
return (
axiosInstance
.get("/frontcommerce/price", {
params: {
skus: skus.join(","),
},
})
.then((response) => {
const prices = response.data;
/* [
{sku: "P02", price: 12},
{sku: "P03", price: 13},
{sku: "P01", price: 11},
] */
return prices;
})
// results will be sorted according to the initial skus passed (P01, P02, P03)
.then(reorderForIds(skus, "sku"))
);
reorderForIdsCaseInsensitive
As its name implies, it is very similar to reorderForIds
but ids are compared
in a case insensitive way.
Example:
return axiosInstance
.get(`/products`, { params: searchCriteria })
.then((response) => response.data.items.map(convertMagentoProductForFront))
.then(reorderForIdsCaseInsensitive(skus, "sku"));
makeBatchLoaderFromSingleFetch
Until now, we created batching functions using a remote API that allowed to
request several results at once
(https://inventory.example.com/stocks?skus=PANT-01,PANT-02,…,PANT-10
).
When using 3rd party APIs or legacy systems, such APIs might not always be available. Using dataLoaders in this case will not allow you to reduce the number of requests in the absolute, however it could still allow you to prevent most of these requests (or reduce its number in practice) thanks to caching. It is thus very convenient when dealing with a slow service.
The makeBatchLoaderFromSingleFetch
allows you to create a batching function
from a single fetching function easily.
Pseudo signature:
makeBatchLoaderFromSingleFetch = (
function singleFetch, // function that fetches data for a single id
function singleResponseMapper = ({ data }) => data // function that transform a response into data
) => ids => Observable(sortedData);
Example (from the Magento2 category loader):
import { makeBatchLoaderFromSingleFetch } from "server/core/graphql/dataloaderHelpers";
// …
const loadBatch = makeBatchLoaderFromSingleFetch(
(id) => axiosInstance.get(`/categories/${id}`),
(response) => convertCategoryMainAttributesForFront(response.data)
);
const loader = makeDataLoader("CatalogCategory")(
(ids) => loadBatch(ids).toPromise() // <-- note the `toPromise()` here
);
makeBatchLoaderFromSingleFetch
returns an Observable. You must thus convert it
to a Promise using the .toPromise()
method.
Caching dataLoaders data
By default, all dataLoaders are using a per-request in-memory caching strategy. It means that during the same GraphQL query, the same data will only be requested once.
Front-Commerce is also shipped with a persistent cache implementation, using a Redis strategy (see Caching strategies). You can implement new strategies to support more services (we also can help and support more strategies, please contact us).
The DataLoader cache must be configured in the
config/caching.js
configuration file.
Please refer to the reference documentation for further details or read the
following section to choose the most relevant strategies for your context.
Caching strategies
This section provides details about the available strategies implementations in
Front-Commerce. They can be used by using them in the implementation
key of
your caching strategies configuration.
Redis
The Redis
implementation allows to cache data in a redis server.
It uses node's redis
package as a redis
client. Please refer to the package documentation for
all the available options.
Here is a full configuration example:
export default {
strategies: [
{
implementation: "Redis",
supports: "*",
config: {
// see https://www.npmjs.com/package/redis#options-object-properties
host:
process.env.FRONT_COMMERCE_CLOUD_REDIS_HOST || // Front-Commerce Cloud variables
process.env.FRONT_COMMERCE_REDIS_HOST || // default variables
"127.0.0.1", // default value (localhost)
port:
process.env.FRONT_COMMERCE_CLOUD_REDIS_PORT ||
process.env.FRONT_COMMERCE_REDIS_PORT ||
6379,
db:
process.env.FRONT_COMMERCE_CLOUD_REDIS_DB ||
process.env.FRONT_COMMERCE_REDIS_DB ||
0,
// Front-Commerce options
// defaultExpireInSeconds: 82800, // default: 23 hours
// invalidationScanIterationSize: 2000, // default: 1000
},
},
],
};
PerMagentoCustomerGroup
Since version 2.0.0-rc.0
The PerMagentoCustomerGroup
implementation is a decorator that is specific to
Magento 1 and Magento 2 modules. It will decorate the existing caching
strategies so that DataLoader keys are specific to the current customer group.
We highly recommend to use it on Magento stores that have price per group, so
they can leverage other caching mechanisms (such as Redis
).
It is possible to provide a default group to use as scope for guest users.
Here is a configuration example:
export default {
strategies: [
// The PerMagentoCustomerGroup strategy MUST be registered after a persistent cache implementation
// because it has no effect in the context of the default per-request in-memory caching.
{
implementation: "Redis",
supports: "*",
config: {
host: "127.0.0.1",
},
},
{
implementation: "PerMagentoCustomerGroup",
// Will scope all data from the CatalogPrice DataLoader with the customer group
// (and other relevant price data loaders)
// before they are transmitted to the previous strategy (Redis).
// Other dataLoaders will use Redis storage in a standard fashion.
supports: [
"CatalogPrice",
"CatalogProductChildrenPrice",
"CatalogProductBundlePrice",
],
config: {
defaultGroupId: 0,
},
},
],
};
PerMagentoCustomerTaxZone
Since version 2.14 on Magento 2 Since version 2.26 on Magento 1
The PerMagentoCustomerTaxZone
implementation is a decorator that is specific
to the Magento platform (Magento 1 and Magento 2). It will decorate the existing
caching strategies so that DataLoader keys are specific to the current
customer's tax zone. We highly recommend to use it on Magento stores that have
price with taxes depending on several tax zones, so they can leverage other
caching mechanisms (such as Redis
).
As such strategy can be complex, the tax zone definition is left to the
integrator through the taxZoneKeyFromAddress
function
Here is a configuration example:
export default {
strategies: [
// The PerMagentoCustomerTaxZone strategy MUST be registered after a persistent cache implementation
// because it has no effect in the context of the default per-request in-memory caching.
{
implementation: "Redis",
supports: "*",
config: {
host: "127.0.0.1",
},
},
{
implementation: "PerMagentoCustomerTaxZone",
supports: [
"CatalogPrice",
"CatalogProductChildrenPrice",
"CatalogProductBundlePrice",
],
config: {
addressType: "shipping", // or "billing" depending on which address is used to define the taxes showed to the user
defaultTaxZoneKey: "FR",
// Uncomment this line to override the default tax zone partitioning
// taxZoneKeyFromAddress: (address) => {
// // see https://docs.magento.com/user-guide/tax/tax-zones-rates.html
// return address.country_id;
// },
},
},
],
};
PerCurrency
Since version 2.2
The PerCurrency
implementation is a decorator that is specific to Magento 1
integrations. It will decorate the existing caching strategies so that
DataLoader keys are different depending on the store's currency selected by the
user. This is only useful if a store has multiple currencies
(config/stores.js::availableCurrencies
).
It should be used on any DataLoader that returns a price value based on the
user's session.
Here is a configuration example:
export default {
strategies: [
// The PerCurrency strategy MUST be registered after a persistent cache implementation
// because it has no effect in the context of the default per-request in-memory caching.
{
implementation: "Redis",
supports: "*",
config: {
host: "127.0.0.1",
},
},
{
implementation: "PerCurrency",
supports: [
"CatalogPrice",
"CatalogProductChildrenPrice",
"CatalogProductBundlePrice", // only for Magento 2
"CatalogProductBundle", // only for Magento 1
],
},
],
};
PerMagentoAdminRole
Since version 2.1
The PerMagentoAdminRole
implementation is a decorator that is specific to
Magento 2 integrations. It will decorate the existing caching strategies so that
DataLoader keys are different for admin users and customers. This is useful if a
custom Magento API returns a different result for admins and normal customers.
Here is a configuration example:
export default {
strategies: [
// The PerMagentoAdminRole strategy MUST be registered after a persistent cache implementation
// because it has no effect in the context of the default per-request in-memory caching.
{
implementation: "Redis",
supports: "*",
config: {
host: "127.0.0.1",
},
},
{
implementation: "PerMagentoAdminRole",
supports: ["CatalogProduct"],
},
],
};
Advanced usage
If you need additional implementations or want to leverage strategies for a specific use case, please contact us so we can discuss it and guide you!
Invalidating the cache
For persistent cache, it is necessary that remote systems invalidate cache when relevant.
Front-Commerce provides several endpoints for it. They respond to GET
or
POST
queries and are secured with a token to be passed in a auth-token
header. The expected token must be configured with the
FRONT_COMMERCE_CACHE_API_TOKEN
environment variable.
POST
for batched invalidations
Since version 2.1
This is the recommended way to invalidate cache. It allows to invalidate several entries in one HTTP call which is more efficient.
- Endpoint:
/_cache
invalidate all data from the scopes sent in the body - Body: list of cache invalidation descriptor with the following object keys
scope
: shop code (for instance one store)key
: loader key to invalidateid
: single id to invalidate for thekey
loader (in the givenscope
)
For each key of the invalidation descriptor, it is possible to define the value
"all"
(reserved keyword) to invalidate every defined object. See the example
below.
[
{ scope: "default", key: "CatalogProduct", id: "VSK12" },
{ scope: "default", key: "all", id: "VSK13" },
{ scope: "all", key: "CatalogCategory", id: "42" },
{ scope: "default", key: "CmsPage", id: "all" },
];
The payload is
limited to 1Mb by default
to prevent abuses. You can extend this limit using configurations. See
"I cannot POST
a big payload to the server "
for a way to define a greater value.
GET
for atomic invalidations
These endpoints were the first ones implemented in Front-Commerce. They are less efficient than batching invalidations, but may be more convenient for webhooks or simple scripts.
/_cache
: invalidate all data in persistent cache/_cache/:scope
: invalidate all data for a given scope (for instance one store)/_cache/:scope/:key
: invalidate all data of a given loader (matching:key
) for a given store/_cache/:scope/:key/:id
: invalidate cached data for a single id of a given loader in a given store
Our Magento 2 and Magento 1 extensions handles cache invalidation by default, please refer to their respective documentations to learn how to add your own invalidation logic (for custom Magento entities).
Troubleshooting
Front-Commerce provides a way to debug several aspects of the caching layer. You
can use the
DEBUG="front-commerce:cache"
environment variable
to view information about caching strategies used for a GraphQL query, along
with cache invalidation requests received by your Front-Commerce server.