📊📈🏭Data Architecture: Challenges with Azure

6 min read3 days ago

For the past 8 months, I have been dealing with Azure. Solving Data Management problems — ETL, Validation and Governance, the usual.

At first, I was excited. Azure is Azure — Opinionated, developer friendly and intuitive.

I did a couple of courses from Udemy on basic data management setups with Azure Data Factory and Azure Synpase. Felt confident that I could handle any challenges.

I also felt fairly confident to create this Data architecture that Azure recommends

Then I saw her face. Now I’m a believer..

But.. Some things in Azure were overly complex — and the documentation just lead us down paths that ended up in dead ends.. sometimes.

Here are some recent challenges I had with Azure(initially), especially when setting up Data and ETL pipelines..

The Challenges

IAM setup is.. wow.. it wasn’t straightforward

Setup users. Setup user groups. Setup Resource groups. Setup so many groups. I was spending a lot of time just setting up groups. And access rights. And privileges. Only to find out that the Permission I though would give Data Factory access to Blob Storage didn’t actually give me access to blob Storage.

Now I know what you might be thinking. No, it isn’t tough. We have been setting up Users, Roles and permissions in Azure for ages now. It’s so easy. Well, I didn’t find it as intuitive as I hoped it would be. Figured it out eventually, but it wasn’t .. well.. intuitive!

Setting up Azure Purview looked easy at first.. but it wasn’t

I kept getting some cryptic permission issues when I wanted to see Reports from a scan I ran. I kept puzzling over the fact that I had given myself the permissions on Purview to see just about everything. Yet it was blocking me. The documentation wasn’t giving me any hints.

Then I asked GPT and it seemed to have an idea about what I had to do. And I learnt that when we create Data Domains in Purview, those have their own permissions. At the very least the error message could have explained to me from where I was getting blocked. But nope. I wasted a couple of hours trying to figure this out.

Trying to create a Python based Azure Function took me — 2 days!

I tried to follow the documentation to use VS Code to develop the function. Installed the Azure plugins on VSCode. Logged in. Tried to push my function code — got some weird ENOENT error that I couldn’t make head or tails about. Googling and GPTing didn’t help. I scoured the forums, stack overflows, the documentation and finally ended up on a medium article that alluded to the fact that we need a function.json file. Also, I had my code in the root folder, when in fact, the function’s python file should be in its own folder. What the actual f#$k!

Finally, after toiling for a couple of days, I got my function to work. No thanks to the documentation. Why can’t they just help me out with a simple wizard to create functions. They know the settings. It doesn’t have to be this hard!

Azure Data Factory cannot use Self Hosted Integration Runtime

Well it can but only for Copy Data Activities.

They also don’t tell you that AzureIntegrationRuntime and AutoresolveIntegrationRuntimes get IP addresses allocated at random — from an exhaustive list of IP addresses per region — which gets updated from time to time.

So what’s the problem?

You see, if your Data Factory needs to use the integration runtime to talk to a third party system and they are hell bent on whitelisting IP addresses, then there is going to be a huge list of CIDR blocks that they would have to whitelist.

This is the link to that list — Azure IP Ranges and Service Tags

Have at it. So we couldn’t use AIRs to connect to the Third party system which was Snowflake btw. Instead I had to use Self Hosted Integration runtimes which don’t really perform as well as AIRs.

Now, I know that you will argue that AIR’s are meant for internal pipelines — pipelines internal to Azure. However, this isn’t evident and they haven’t made it easy to know this either.

Also, they should allow support for scalable integration runtimes to sources like Databricks and Snowflake that allow that scale. I dunno, I wouldn’t want the source system to be a bottleneck.

Running Data Factory in Debug mode is.. well.. slow

Atleast, it was for a few days. Took time for the pipeline to startup and run in Debug mode. (It waited for a bit before getting the compute)

Then too, the logs wouldn’t tell me the status — not specifically anyway. It was a generic status update and we weren’t really sure what was going on with the pipeline.

A very good example was a Mapping Data flow I was trying to debug the one day. It hung on for about 9 minutes before telling me it had failed. For some logic error. 9 Minutes for me to know I had made a mistake!

And on another occasion — this happened

It was stuck like this for 2.5 hours; No logs to tell me what in God’s name it’s trying to do. I had to kill the pipeline and restart it only for it to execute within 15 minutes the next time — successfully.

Make sure your Storage and Pipelines are in the same region

We learnt the hard way that your pipeline expects the storage to be in the same region or we get some weird IP not found error. No idea whether this is by design.

(An unrelated quote btw)

We know you’ve tried because you’ve had problems

Azure has a way to go before it becomes — well, forgiving and friendly. I guess that maybe this isn’t such a bad thing after all. If platforms remain this complicated, ain’t no way that AI be taking over our jobs.

However, let me come right out and say this, I still love Azure as a platform. For things it gets wrong, it gets twice as many things right.

Since we got ahead of the teething issues, we have built some extremely complex systems in Azure. At Scale. It performs flawlessly. It has great instrumentation with Monitor and App Insights.

Once you get a hang of it, Purview is fun. It actually is a lot of fun, looking for your data, cataloguing and classifying it. There’s something about neatly organizing your data in hierarchies and folders that feels — Satisfactory!

On hindsight, I might have gotten some things wrong. The documentation doesn’t tell me if I was. Maybe some of you will tell me if I was just being paranoid.. or delusional. I would love to know your opinions!

Leave a comment and let me know if you guys faced similar issues with Azure.

Follow me Ritesh Shergill

for more articles on

🤖AI/ML

👨‍💻 Tech

👩‍🎓 Career advice

📲 User Experience

🏆 Leadership

I also do

✅ Career Guidance counselling — https://topmate.io/ritesh_shergill/149890

✅ Mentor Startups as a Fractional CTO — https://topmate.io/ritesh_shergill/193786