Description of Driving Event:
This Lesson Learned is based on Reliability Practice No. PD-ED-1250; from NASA Technical Memorandum 4322A, NASA Reliability Preferred Practices for Design and Test. This practice significantly enhances the probability of mission success by ensuring that problems/failures occurring during ground test are properly identified, documented, assessed, tracked and corrected in a controlled and approved manner. Another benefit of the PFR procedure is to provide data on problem/failure trends. Trend data may then be analyzed so that errors are not repeated on future hardware and software. Implementation Method: A formal procedure (Reference 1) establishes the GSFC system for the identification and documentation of Problem/Failure Reports (PFRs), the data collection and monitoring of PFRs, the risk assessment and rating of problems/failures, and the Project and Center management approvals of corrective actions. For hardware, the procedure begins with the first application of power (or first test usage for mechanical items) at the lowest level of assembly of qualification or flight configuration hardware. For software, the procedure begins with the first test use of the software item with a hardware item of the mission system at the component level or higher. Identification and Documentation of PFRs: Anyone with knowledge of a problem or failure may initiate a PFR. PFRs are generated for any departure from a design, performance, testing, or handling requirement that affects the function of flight equipment, ground support equipment that interfaces with flight equipment or that could compromise mission objectives. All GSFC generated PFRs begin as problem records identified on applicable Certification Logs (Ref. 3), Problem Records (Ref. 4) and work orders. After a PFR occurrence, the Certification Log and Work Order, as applicable, are used to document, define and control any repairs, fixes, or modifications that may be performed on the hardware or software. PFRs are generated and entered electronically into a PFR database, normally accessible on designated personal computers in hardware assembly and test areas. The PFR database program's on-screen instructions indicate certain required data that is to be provided, and automatically assigns a PFR number for traceability. PFRs can be prepared manually on GSFC PFR Form 4-2 (Figures 1 & 2) when the PFR database is not available or the system is inoperative. The PFR data on this form is entered into the database at the earliest opportunity following the problem/failure, but must be filed within 24 hours. All PFRs, including both hardcopy and electronic formats, are forwarded to the Project Office for routing to designated personnel. Contractor-generated PFRs are provided to the Project Office in accordance with applicable contract requirements. PFRs generated in-house or at launch sites are provided directly to the Project Flight Assurance Manager (FAM) and the Systems Assurance Manager (SAM) or their designees, as appropriate. The Project FAM/SAM or their designees enter all the hardcopy PFRs that they receive into the GSFC PFR database. Data Collection and Monitoring of PFRs: The FAM/SAM provide copies of all PFRs to the members of a Failure Review Board (FRB) that is established and authorized by each project to investigate, analyze, and determine the cause of each problem/failure and the appropriate disposition of each PFR. The FRB consists of the following project personnel as a minimum: - The project FAM/SAM or designees (serves as chairman)
- The Project Manager or designee
- The cognizant engineer(s) responsible for the failed item(s).
Continuation of operations or immediate initiation of troubleshooting operations after the occurrence of a problem or failure is at the discretion of the cognizant engineer and other members of the FRB as appropriate and is based on the following factors: - The number and severity of problem/failure occurrences during the operation in question;
- The degree of confidence in ability to quickly diagnose the source or cause of the problem or failure;
- The potential risk to hardware or software posed by continuing the operations after the problem or failure;
- The consequences of terminating an operation in which a failure has occurred (e.g., thermal vacuum testing).
Risk Assessment and Rating of Problems/Failures: The residual risk of each PFR must be considered by the FRB before it can be considered for closure. A two-factored risk rating system is used. The first rating factor identifies the impact that the problem/failure would have on the flight hardware and/or software performance capabilities if it occurred during the mission. Redundancy is ignored in establishing this rating. The failure effect ratings are illustrated in Table 1. | Table 1. PFR Failure Effect Ratings | | RATING | FAILURE EFFECT | | 3 | Catastrophic or major degradation to mission, system or instrument performance, reliability or safety | | 2 | Moderate or significant degradation to mission, system or instrument performance, reliability or safety, defined as: a) Appreciable change in functional capability b) Appreciable degradation of engineering or science telemetry c) Causing significant operational difficulties or constraints d) Causing reduction in mission lifetime | | 1 | Negligible or no impact on mission, system, or instrument performance, reliability or safety. | | The second rating factor identifies the confidence in understanding both the cause(s) of the problem/failure and the effectiveness of the resulting corrective action. The failure corrective action ratings are shown in Table 2. | Table 2. PFR Failure Corrective Action Ratings | | RATING | FAILURE CORRECTIVE ACTION | | 1 | The root cause of the problem/failure has been determined with confidence by analysis and/or test. Corrective action has been determined, implemented and verified with certainty. There exists no credible possibility of problem/failure recurrence. | | 2 | The root cause of the problem/failure has not been determined with confidence. However, some corrective action has been determined, implemented and verified with certainty. There exists some possibility that the problem/failure may recur. | | 3 | The root cause is considered to be known and understood with confidence. Corrective action has not been determined, implemented or verified with certainty. There exists some possibility that the problem/failure may recur. | | 4 | The root cause has not been defined with certainty. Corrective action has not been determined, implemented nor verified with certainty. There exists some possibility that the problem/failure may recur. | | Any PFR with a failure effect rating of "2" or "3", coupled with a failure corrective action rating of "3" or "4" is identified as a Red Flag PFR indicating significant residual risk associated with the problem/failure in question. GSFC Project Manager signatures are required on all red flag PFRs. No delegation of signature authority is permitted. All red flag PFRs are highlighted at GSFC Project Flight Assurance Reviews. Project and Center Management Approval of Corrective Actions: A PFR is considered for closure when the FRB determines that appropriate and sufficient investigation of the cause of the problem/failure has been completed and that commensurate corrective action has been completed and properly documented. The following signatures are required to formally close a PFR: - The Contractor Program/Project Manager and Contractor QC for PFRs initiated by off-site GSFC contractors
- The GSFC FRB chairman
- The GSFC Project Manager or his delegated FRB representative. Red Flag PFRs, defined above require the signature of the Project Manager.
Technical Rationale: This practice ensures that all problem/failure occurrences including minor glitches are fully reported and dealt with as appropriate in a formal procedure with Project and GSFC management oversight and approval. References - GSFC Flight Assurance Procedure, No. FAP P-303-849, "Problem/Failure Reporting"
- Guidelines for Standard Payload Assurance Requirements (SPAR) for GSFC Orbital Projects
- The GSFC Certification Log, FAP P-303-820
- Problem Record Items, GSFC, FAP P-303-845
- NASA NHB 5300.4 (1A-1) ReliabilityProgram Requirements for Aeronautical and Space Systems Contractors (January 1987)
- Reliability Preferred Practice PD-AP-1304, Problem/Failure Report Independent Review/Approval
- Reliability Preferred Practice PD-AP-1305, Risk Rating of Problem/Failure Reports
- Reliability Preferred Practice PD-ED-1232, Spacecraft Orbital AnomalyReport (SOAR) System
- Reliability Preferred Practice PD-ED-1255, Problem Reporting and Corrective Action System
[D] Figure 1. GSFC PFR Form 4-2: Problem/Failure Report (Click on Image for .pdf file) [D] Figure 2. GSFC PFR Form 4-2 (cont.): Instruction Sheet for Problem/Failure Report (Click on image for .pdf file)
Lesson(s) Learned:
Without this practice with its formal reporting procedures, problems/failures, particularly minor glitches, may be overlooked or not considered serious enough to investigate or report to Project Management. This could result in recurrence of the problem/failure during the mission and result in significant degradation in performance.
Recommendation(s):
A formal procedure is followed in the reporting and documentation of problems/failures occurring during test, pre-launch operations, and launch operations for both hardware and software. A separate system, the "Spacecraft Orbital Anomaly Report (SOAR)", is used for the reporting, evaluation and correction of problems occurring on-orbit (see Reliability Preferred Practice No. PD-ED-1232).
Evidence of Recurrence Control Effectiveness:
This practice has been used on all GSFC in-house flight programs in accordance with the GSFC Quality Assurance Requirements defined in Reference 2. Out-of-house contractors may use their system so long as it meets the provisions of the GSFC Quality Assurance Requirements and is approved by GSFC.
Documents Related to Lesson:
N/A
Mission Directorate(s):
- Exploration Systems
- Science
- Space Operations
- Aeronautics Research
Additional Key Phrase(s):
- Configuration Management
- Facilities
- Flight Equipment
- Ground Equipment
- Hardware
- Launch Process
- Launch Vehicle
- Logistics
- Parts Materials & Processes
- Payloads
- Risk Management/Assessment
- Safety & Mission Assurance
- Software
- Spacecraft
- Test & Verification
Additional Info:
|