Mid-Level Software Integration Engineer
- Ability to integrate, install, configure, upgrade, compile, and support COTS/GOTS software.
- Generate documentation for the full software stack.
- Update software for sustainment support.
- Basic Linux system administration skills and shell scripting.
- Execute test codes for characterization of software performance.
- Provide software product ownership for HPC tools.
- Working knowledge of CM tools, web documentation, and issue tracking.
- Ability to work in a fast paced environment and switch between various architectural paradigms.
- Develop software tool plugins in the language in which the tools are written.
- Determine the optimal, or best available, HPC configuration for customer needs.
- Analyze software and/or system requirements and various system engineering documents, acquisition plans and software/system descriptions to develop evaluation and test plans and procedures.
- Liaise with Project Directors, software developers, system administrators, hardware maintenance teams and the test team during software and/or system tests on High Performance Computing Systems (HPCs).
- Provide testing expertise and recommendations through full system development lifecycle.
- Develop test plans, test scenarios, and test cases for software and System tests to be run on HPC architectures.
- Provide full-scope System testing — to include, but not limited to: functional, performance, operational, and mission simulation on HPCs.
- Collaborate with the test team to review, verify, validate and refine test plans prior to execution.
- Generate test reports to capture the results of software and system level testing.
- Perform initial software installation, software integration, and software testing on HPCs.
- Trouble-shoot software installation, configuration issues/concerns and collaborate with software developers and project managers to obtain resolutions.
- Install and test software revisions verifying functionality and capabilities.
- Train site personnel to operate, troubleshoot, report and maintain developed and deployed software packages that are installed on the HPC systems.
- Optimize customer written test programs.
- Prepare and conduct data collection and analysis and report status and results.
- Write Standard Operating Procedures (SOPs), installation guides, configuration guides, and troubleshooting guides.
- Develop test scripts that will be used to test a system.
- Update test script repository with current and updated test scripts for team use.
- Provide post operational test support to operational systems.
- Manage and monitor a large Linux Cluster
- Update and patch system packages.
- Modify system configurations as needed to meet customer and mission needs.
- Oversee hardware fixes and changes to the system.
- Manage configuration control of the system.
- Document procedures and processes for supporting the system.
Technical Skills Required:
- A minimum of 5 years’ experience writing scripts using Bash/Python
- A minimum of 5 years’ experience with Unix command line
- A minimum of 5 years’ experience performing Unix System Administration including installation, configuration, and support of COTS/GOTS software in a large scale Unix HPC cluster environment
- General HPC technical knowledge regarding compute, network, memory, and storage components
- Demonstrated experience supporting large Unix HPC Clusters
- Familiar with various network communications like IP and InfiniBand
- Excellent verbal and written communication skills
- Experience with Configuration Management, including versioning and automated tools such as Puppet, Chef, Salt, and Ansible
- Demonstrated experience with the sustainment, support, maintenance, development and deployment of Lustre based HPC parallel file systems.
- Familiar with Site Reliability Engineering (SRE) principles and applications
- Demonstrated experience using system monitoring tools such as Nagios and ibmonitor
- Demonstrated experience developing test plans, procedures, and reports ensuring consistency across the storage architecture
Special Technical Skills Desired:
- Experience with the Atlassian Tool Suite (JIRA, Bitbucket, Confluence)
- Familiarity with test driven Agile development best practices
Minimum Experience Required:
- Bachelor’s Degree in Computer Science or related field and have at least ten (10) years of demonstrable experience with integrating, installing, configuring, upgrading, compiling, and supporting COTS/GOTS software in a heterogeneous operating system environment OR
- The individual shall have five (5) years full time Computer Science directly related work that can be substituted for a degree and have at least ten (10) years of demonstrable experience OR
- An industry recognized professional certification, as defined in the TT0s, may substitute as one (1) year experience OR
- A Master’s Degree in Computer Science or related field may substitute for two (2) years’ experience.
- TS/SCI with Polygraph Required