Position Summary
1. Standardize service processes and system, and expand those with related business units for upward standardization.2. Innovate processes by reviewing investment validity upon a new system and developing it for enhancing operation system efficiency.
3. Enhance service system by maintaining and improving operation system and support stable process operation.
Role and Responsibilities
1. Operational support for Samsung internal GPU clusters utilized for AI training
2. Manage Kubernetes related configurations
3. Troubleshoot issues in Kubernetes layer
4. [Service system strategy and standard setting] Participate in developing an effective and innovative system operation measure by establishing service system related strategy and standard.
5. [Process standardization] Plan to improve, standardize, and innovate processes by analyzing service process issues and vulnerabilities.
6. [Process propagation] Prepare to achieve differentiation, reduce costs, and increase efficiency of operation by expanding standard processes across the related business units.
7. [System operation and maintenance] Prepare to optimize system by examining defects in an operation system or areas to be improve and reflecting them in the system.
8. [New system development] Prepare to upgrade service operation system by introducing a new system or re-building a system in order to prepare for paradigm changes.
9. [System investment deliberation] Make an investment plan for each specific system related item and participate in deliberation.
Skills and Qualifications
Support operational of multiple GPU based Kubernetes clusters:
Monitor Cluster Status and ConditionRBAC ManagementIssue Handling24/7 on-call Operational SupportCompetencies
Desired Skills and Experience:
Linux[Mandatory] Familiar with Linux environment especially UbuntuKubernetes[Mandatory] Have knowledge in KubernetesExperienced in operating Kubernetes for GPU cluster is a good additionConfiguration Management Tools[Mandatory] Familiar with configuring yaml files using Linux text editorsMonitoring & LoggingFamiliar with Prometheus and GrafanaFamiliar with KibanaSkills and Qualifications
Able to comply with company policies and proceduresAble to analyze problems and define the procedures before determining the appropriate actionsAble to show proactive and positive attitudeNormally receives general instructionsEnglish for communication (Korean is a good addition)Have interest or experience in Infrastructure Engineering or Cloud EngineeringBachelor's degree in Computer Science, Software Engineering, or related discipline, or equivalent work experience is required.* Samsung has a strict policy on trade secrets. In applying to Samsung and progressing through the recruitment process, you must not disclose any trade secrets of a current or previous employer.
* Please visit Samsung membership to see Privacy Policy, which defaults according to your location. You can change Country/Language at the bottom of the page. If you are European Economic Resident, please click here.