System Design Series: DuPont analysis using Distributed Snapshot

Ritesh Shergill
4 min readFeb 12, 2024

A financial firm maintains financial records of around 50,000 companies and wants to determine the financial performance of all these companies.

But first, the title mentions it and we must define it -

What is DuPont analysis?

𝑻𝒉𝒆 𝑫𝒖𝑷𝒐𝒏𝒕 π’‚π’π’‚π’π’šπ’”π’Šπ’” π’Šπ’” 𝒂 π’‡π’“π’‚π’Žπ’†π’˜π’π’“π’Œ 𝒇𝒐𝒓 π’‚π’π’‚π’π’šπ’›π’Šπ’π’ˆ π’‡π’–π’π’…π’‚π’Žπ’†π’π’•π’‚π’ π’‘π’†π’“π’‡π’π’“π’Žπ’‚π’π’„π’† 𝒐𝒇 𝒂𝒏 π’π’“π’ˆπ’‚π’π’Šπ’›π’‚π’•π’Šπ’π’. 𝑰𝒕 π’Šπ’” 𝒂 𝒖𝒔𝒆𝒇𝒖𝒍 π’•π’†π’„π’‰π’π’Šπ’’π’–π’† 𝒖𝒔𝒆𝒅 𝒕𝒐 π’…π’†π’„π’π’Žπ’‘π’π’”π’† 𝒕𝒉𝒆 π’…π’Šπ’‡π’‡π’†π’“π’†π’π’• π’…π’“π’Šπ’—π’†π’“π’” 𝒐𝒇 𝒓𝒆𝒕𝒖𝒓𝒏 𝒐𝒏 π’†π’’π’–π’Šπ’•π’š.

DuPont analysis is a useful tool to compare the operational efficiency of two similar firms.

Essentially, it evaluates the component parts of a company’s ROE. This allows an investor to determine what financial activities contribute the most to the changes in ROE and thus performance.

The formula for calculation of DuPont Analysis is as follows β€”

Investopedia β€” DuPont Analysis: The DuPont Formula Plus How to Calculate and Use It

Ref: Investopedia β€” DuPont Analysis: The DuPont Formula Plus How to Calculate and Use It

https://www.investopedia.com/terms/d/dupontanalysis.asp

So what does this have to do with the distributed snapshot pattern?

Well, these calculations would likely be heavy calculations, given that there would be ETL pipelines/Costly database queries in place to generate overall calculations like Net Income, Average Total Assets, etc.

By carrying out these calculations in parallel for 50,000 companies, we can get the requisite numbers required to finally calculate the DuPont analysis. In essence, companies would have already calculated these numbers as part of financial reports so there is no need to calculate them again. Instead, as we calculate these individual numbers, we store them in a centrally accessible location like an S3 bucket or a database. This is what the architecture looks like

Now we use ETL pipelines to process the individual calculations from these buckets/database to form the distributed snapshot of the intermediate calculation

𝑻𝒉𝒆𝒔𝒆 π’‘π’Šπ’‘π’†π’π’Šπ’π’†π’” 𝒄𝒂𝒏 𝒃𝒆 π’‘π’‚π’“π’‚π’π’π’†π’π’Šπ’›π’†π’… 𝒇𝒐𝒓 𝒆𝒂𝒄𝒉 π’„π’π’Žπ’‘π’‚π’π’š 𝒕𝒐 π’Žπ’‚π’Œπ’† 𝒕𝒉𝒆 π’„π’‚π’π’„π’–π’π’‚π’•π’Šπ’π’ 𝒇𝒂𝒔𝒕𝒆𝒓.

We have now stored the distributed snapshot of

πŸ’Ώ Net Profit Margin

πŸ’Ώ Asset Turnover

πŸ’Ώ Equity Multiplier

And we can reuse these numbers anywhere in any other calculations.

Now assuming that we need to calculate the DuPont analysis for all 50,000 companies, all we need to do is fetch the snapshot of each individual calculation and perform the final calculation

DuPont Analysis=Net Profit Margin * AT * EM

This can be done quickly by another ETL pipeline meant specifically for calculating DuPont Analysis β€”

And thus, we will quickly start getting results in the final Results table as the processing happens in parallel.

It is evident that the Distributed Snapshot pattern is applicable for Data pipelines.

What are the benefits we get with this approach?

🟠 Calculations for each company can be carried out independently and in parallel to speed up the process significantly

🟒We don’t need to wait for each calculation to be completed, we can always calculate from a latest value or a previous value so calculating historical numbers becomes easier.

πŸ”΅We can use AI ML techniques to forecast values from computed snapshots.

🟣We can re-use subparts of the computations in other calculations. For eg Net Income could be used in another calculation by another pipeline.

🟑As each computation is independent, reporting tools can function independently from each other by referring to the single source of truth. Reconciliation also becomes easier as there is a single source of truth.

🟀Retroactive corrections in calculation errors can be applied easily in case some erroneous values were fed in and don’t affect the other calculations.

Thus, the Distributed snapshot pattern divides intermediate computations into separate and re-usable concerns while also maintaining historical snapshots.

I have documented the specifics of the distributed snapshot architecture pattern here β€”

Follow me Ritesh Shergill

for more articles on

πŸ‘¨β€πŸ’» Tech

πŸ‘©β€πŸŽ“ Career advice

πŸ“² User Experience

πŸ† Leadership

I also do

βœ… Career Guidance counselling β€” https://topmate.io/ritesh_shergill/149890

βœ… Mentor Startups as a Fractional CTO β€” https://topmate.io/ritesh_shergill/193786

--

--

Ritesh Shergill

Cybersec and Software Architecture Consultations | Career Guidance | Ex Vice President at JP Morgan Chase | Startup Mentor | Angel Investor | Author