Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.
Responsibilities:
- Engage in and
improve the whole lifecycle of services—from inception and design, through
deployment, operation, and refinement.
- Analyse ITSM
activities of the platform and provide feedback loop to development teams
on operational gaps or resiliency concerns
- Support
services before they go live through activities such as system design
consulting, capacity planning and launch reviews.
- Maintain
services once they are live by measuring and monitoring availability,
latency, and overall system health with automated alerts.
- Scale systems
sustainably through mechanisms like automation and evolve systems by
pushing for changes that improve reliability and velocity.
- Practice
sustainable incident response and detailed postmortems.
- Take a
holistic approach to problem solving, by connecting the dots during a
production event thru the various technology stack that makes up the
platform, to optimize mean time to recover
- Work with a
global team spread across tech hubs in multiple geographies and time zones
- Share
knowledge and mentor junior resources
- Primary
skills should be - Messaging(kafka, mq, nats, Flink), config management
tool(chef infra, habitat, ansible), CI-CD(Bitbucket, Jenkins, XLR),
Scripting(Shell, Python), Programming language basics - Java
Secondary - Event Management tools(Splunk, Dynatrace, Promethius), Cloud - preferred AWS.
Qualifications:
- BS degree in
Computer Science or related technical field involving coding (e.g.,
physics or mathematics), or equivalent practical experience.
- Experience with
algorithms, data structures, scripting, pipeline management, and software
design.
- Systematic
problem-solving approach, coupled with strong communication skills and a
sense of ownership and drive.
- Ability to
help debug, optimize code, and automate routine tasks.
- We support
many different stakeholders. Experience in dealing with difficult
situations and making decisions with a sense of urgency is needed.
- Experience in
one or more of the following is preferred: Python, Go, Bash Scripting.
- Interest in
designing, analysing, and troubleshooting large-scale distributed systems.
- We need team
members with an appetite for change and pushing the boundaries of what can
be done with automation. Experience in working across development,
operations, and product teams to prioritize needs and to build
relationships is a must.
- For work on
our ops team, engineer with experience in industry standard CI/CD tools
like Git/Bitbucket, Jenkins, and Chef. Experience designing and
implementing an effective and efficient CI/CD flow that gets code from dev
to prod with high quality and minimal manual effort is required.
- Analyze ITSM
activities of the platform and provide feedback loop to development teams
on operational gaps or resiliency concerns
- Support
services before they go live through activities such as system design
consulting, capacity planning and launch reviews.
- Maintain
services once they are live by measuring and monitoring availability,
latency and overall system health.
- Scale systems
sustainably through mechanisms like automation, and evolve systems by
pushing for changes that improve reliability and velocity.
- Support the
application CI/CD pipeline for promoting software into higher environments
through validation and operational gating.
- Practice
sustainable incident response and blameless post-mortems
- Take a
holistic approach to problem solving, by connecting the dots during a
production event thru the various technology stack that makes up the
platform, to optimize mean time to recover.
- Work with a
global team spread across tech hubs in multiple geographies and time
zones.
- Share
knowledge and mentor junior resources.
- For team
members supporting the Dev Ops pipeline.
- Design,
implement, and enhance our deployment automation based on Chef. We
need proven experience writing chef recipes/cookbooks as well as designing
and implementing an overall Chef based release and deployment process.
- Use Jenkins
to orchestrate builds as well as link to Sonar, Chef, Maven, Artifactory,
etc. to build out the CI/CD pipeline.
- Support
deployments of code into multiple lower environments. Supporting
current processes needed with an emphasis on automating everything as soon
as possible.
- Design and
implement a Git based code management strategy that will support multiple
environment deployments in parallel. Experience with automation for
branch management, code promotions, and version management is a plus.
Requirements
- Proficiency
in languages like Python, Go, Java, or Bash for automation scripts, tools,
and integrations. Involves writing clean, maintainable code, debugging,
API interactions, version control (e.g., Git), and unit testing
- In-depth
knowledge of Linux/Unix. Includes managing processes, file systems,
permissions, kernel tuning, shell scripting, server configuration,
updates, and security.
- Expertise in
cloud platforms (AWS, GCP, Azure) and tools like Terraform or
CloudFormation for infrastructure as code (IaC). Includes managing virtual
machines, serverless architectures, and container orchestration (e.g.,
Kubernetes, ECS) for scalability and high availability.
- Understanding
of TCP/IP, HTTP, DNS, load balancing, VPNs, and firewalls. Includes
configuring network services and troubleshooting with tools like Wireshark
or traceroute.
- Proficiency
in tools like Splunk, Dynatrace, Prometheus, Grafana, Datadog,
Jaeger/Zipkin for logs, metrics, and tracing. Involves defining Service
Level Indicators (SLIs), setting Service Level Objectives (SLOs), and
creating dashboards for system health insights.
- Experience
with CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions. Includes
automating build, test, and deployment processes, as well as rollback
mechanisms for reliability.
- Skills in
diagnosing and resolving production issues using logs, metrics, and
debugging tools. Includes incident management (e.g., PagerDuty), root
cause analysis (RCA), and blameless postmortems.
- Expertise in
managing and operating Apache Kafka, NATS and MQ. Includes configuring
topics (Kafka) or subjects (NATS), ensuring high availability, scaling
clusters, monitoring performance metrics (e.g., consumer lag, throughput),
and troubleshooting issues like message loss or latency. Involves
understanding partitioning (Kafka) and pub/sub patterns (NATS) for event
streaming and messaging.
Benefits
नौकरी रिपोर्ट करें