Who are we:
Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.

Responsibilities:

Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
Analyse ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health with automated alerts.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and detailed postmortems.
Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
Work with a global team spread across tech hubs in multiple geographies and time zones
Share knowledge and mentor junior resources
Primary skills should be - Messaging(kafka, mq, nats, Flink), config management tool(chef infra, habitat, ansible), CI-CD(Bitbucket, Jenkins, XLR), Scripting(Shell, Python), Programming language basics - Java
Secondary - Event Management tools(Splunk, Dynatrace, Promethius), Cloud - preferred AWS.

Qualifications:

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
Experience with algorithms, data structures, scripting, pipeline management, and software design.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Ability to help debug, optimize code, and automate routine tasks.
We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
Experience in one or more of the following is preferred: Python, Go, Bash Scripting.
Interest in designing, analysing, and troubleshooting large-scale distributed systems.
We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
For work on our ops team, engineer with experience in industry standard CI/CD tools like Git/Bitbucket, Jenkins, and Chef. Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is required.
Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating.
Practice sustainable incident response and blameless post-mortems
Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover.
Work with a global team spread across tech hubs in multiple geographies and time zones.
Share knowledge and mentor junior resources.
For team members supporting the Dev Ops pipeline.
Design, implement, and enhance our deployment automation based on Chef. We need proven experience writing chef recipes/cookbooks as well as designing and implementing an overall Chef based release and deployment process.
Use Jenkins to orchestrate builds as well as link to Sonar, Chef, Maven, Artifactory, etc. to build out the CI/CD pipeline.
Support deployments of code into multiple lower environments. Supporting current processes needed with an emphasis on automating everything as soon as possible.
Design and implement a Git based code management strategy that will support multiple environment deployments in parallel. Experience with automation for branch management, code promotions, and version management is a plus.

Requirements

Proficiency in languages like Python, Go, Java, or Bash for automation scripts, tools, and integrations. Involves writing clean, maintainable code, debugging, API interactions, version control (e.g., Git), and unit testing
In-depth knowledge of Linux/Unix. Includes managing processes, file systems, permissions, kernel tuning, shell scripting, server configuration, updates, and security.
Expertise in cloud platforms (AWS, GCP, Azure) and tools like Terraform or CloudFormation for infrastructure as code (IaC). Includes managing virtual machines, serverless architectures, and container orchestration (e.g., Kubernetes, ECS) for scalability and high availability.
Understanding of TCP/IP, HTTP, DNS, load balancing, VPNs, and firewalls. Includes configuring network services and troubleshooting with tools like Wireshark or traceroute.
Proficiency in tools like Splunk, Dynatrace, Prometheus, Grafana, Datadog, Jaeger/Zipkin for logs, metrics, and tracing. Involves defining Service Level Indicators (SLIs), setting Service Level Objectives (SLOs), and creating dashboards for system health insights.
Experience with CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions. Includes automating build, test, and deployment processes, as well as rollback mechanisms for reliability.
Skills in diagnosing and resolving production issues using logs, metrics, and debugging tools. Includes incident management (e.g., PagerDuty), root cause analysis (RCA), and blameless postmortems.
Expertise in managing and operating Apache Kafka, NATS and MQ. Includes configuring topics (Kafka) or subjects (NATS), ensuring high availability, scaling clusters, monitoring performance metrics (e.g., consumer lag, throughput), and troubleshooting issues like message loss or latency. Involves understanding partitioning (Kafka) and pub/sub patterns (NATS) for event streaming and messaging.

Benefits

आवेदन करें

सहेजें

नौकरी रिपोर्ट करें

Middleware - SRE MQ

Requirements

Benefits

Middleware Administrator

SRE Architect

Mainframe CICS/MQ System Programmer

SRE Engineer (DevOps) (Remote)

SRE III Cloud

Oracle Middleware & Cloud Support Engineer

Manager SRE

RTOS, Embedded protocols and middleware (CE510SF RM 3442)

SRE

IMS Middleware Admin L3