System Design Series: DuPont analysis using Distributed Snapshot

4 min readFeb 12, 2024

A financial firm maintains financial records of around 50,000 companies and wants to determine the financial performance of all these companies.

But first, the title mentions it and we must define it -

What is DuPont analysis?

𝑻𝒉𝒆 𝑫𝒖𝑷𝒐𝒏𝒕 𝒂𝒏𝒂𝒍𝒚𝒔𝒊𝒔 𝒊𝒔 𝒂 𝒇𝒓𝒂𝒎𝒆𝒘𝒐𝒓𝒌 𝒇𝒐𝒓 𝒂𝒏𝒂𝒍𝒚𝒛𝒊𝒏𝒈 𝒇𝒖𝒏𝒅𝒂𝒎𝒆𝒏𝒕𝒂𝒍 𝒑𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝒐𝒇 𝒂𝒏 𝒐𝒓𝒈𝒂𝒏𝒊𝒛𝒂𝒕𝒊𝒐𝒏. 𝑰𝒕 𝒊𝒔 𝒂 𝒖𝒔𝒆𝒇𝒖𝒍 𝒕𝒆𝒄𝒉𝒏𝒊𝒒𝒖𝒆 𝒖𝒔𝒆𝒅 𝒕𝒐 𝒅𝒆𝒄𝒐𝒎𝒑𝒐𝒔𝒆 𝒕𝒉𝒆 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒕 𝒅𝒓𝒊𝒗𝒆𝒓𝒔 𝒐𝒇 𝒓𝒆𝒕𝒖𝒓𝒏 𝒐𝒏 𝒆𝒒𝒖𝒊𝒕𝒚.

DuPont analysis is a useful tool to compare the operational efficiency of two similar firms.

Essentially, it evaluates the component parts of a company’s ROE. This allows an investor to determine what financial activities contribute the most to the changes in ROE and thus performance.

The formula for calculation of DuPont Analysis is as follows —

Investopedia — DuPont Analysis: The DuPont Formula Plus How to Calculate and Use It

Ref: Investopedia — DuPont Analysis: The DuPont Formula Plus How to Calculate and Use It
https://www.investopedia.com/terms/d/dupontanalysis.asp

So what does this have to do with the distributed snapshot pattern?

Well, these calculations would likely be heavy calculations, given that there would be ETL pipelines/Costly database queries in place to generate overall calculations like Net Income, Average Total Assets, etc.

By carrying out these calculations in parallel for 50,000 companies, we can get the requisite numbers required to finally calculate the DuPont analysis. In essence, companies would have already calculated these numbers as part of financial reports so there is no need to calculate them again. Instead, as we calculate these individual numbers, we store them in a centrally accessible location like an S3 bucket or a database. This is what the architecture looks like

Now we use ETL pipelines to process the individual calculations from these buckets/database to form the distributed snapshot of the intermediate calculation

𝑻𝒉𝒆𝒔𝒆 𝒑𝒊𝒑𝒆𝒍𝒊𝒏𝒆𝒔 𝒄𝒂𝒏 𝒃𝒆 𝒑𝒂𝒓𝒂𝒍𝒍𝒆𝒍𝒊𝒛𝒆𝒅 𝒇𝒐𝒓 𝒆𝒂𝒄𝒉 𝒄𝒐𝒎𝒑𝒂𝒏𝒚 𝒕𝒐 𝒎𝒂𝒌𝒆 𝒕𝒉𝒆 𝒄𝒂𝒍𝒄𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒇𝒂𝒔𝒕𝒆𝒓.

We have now stored the distributed snapshot of

💿 Net Profit Margin

💿 Asset Turnover

💿 Equity Multiplier

And we can reuse these numbers anywhere in any other calculations.

Now assuming that we need to calculate the DuPont analysis for all 50,000 companies, all we need to do is fetch the snapshot of each individual calculation and perform the final calculation

DuPont Analysis=Net Profit Margin * AT * EM

This can be done quickly by another ETL pipeline meant specifically for calculating DuPont Analysis —

And thus, we will quickly start getting results in the final Results table as the processing happens in parallel.

It is evident that the Distributed Snapshot pattern is applicable for Data pipelines.

What are the benefits we get with this approach?

🟠 Calculations for each company can be carried out independently and in parallel to speed up the process significantly

🟢We don’t need to wait for each calculation to be completed, we can always calculate from a latest value or a previous value so calculating historical numbers becomes easier.

🔵We can use AI ML techniques to forecast values from computed snapshots.

🟣We can re-use subparts of the computations in other calculations. For eg Net Income could be used in another calculation by another pipeline.

🟡As each computation is independent, reporting tools can function independently from each other by referring to the single source of truth. Reconciliation also becomes easier as there is a single source of truth.

🟤Retroactive corrections in calculation errors can be applied easily in case some erroneous values were fed in and don’t affect the other calculations.

Thus, the Distributed snapshot pattern divides intermediate computations into separate and re-usable concerns while also maintaining historical snapshots.

I have documented the specifics of the distributed snapshot architecture pattern here —

The Distributed-Snapshot Pattern

How often do we have a requirement for a complex computation — A computation that is intensive in terms of memory and…

riteshshergill.medium.com

Follow me Ritesh Shergill

for more articles on

👨‍💻 Tech

👩‍🎓 Career advice

📲 User Experience

🏆 Leadership

I also do

✅ Career Guidance counselling — https://topmate.io/ritesh_shergill/149890

✅ Mentor Startups as a Fractional CTO — https://topmate.io/ritesh_shergill/193786