Inputs Teacher Model (Pretrained) Internal Features!! Input C (Complete Data) Transformer Encoder T Teacher Prediction y_! Input M (Missing Data) Prediction Loss (y s'Avs Total Loss A Knowledge Distillation (Student B) Knowledge Distillation (Student A) Feature Alignment (Avs Backpropagation Total Loss B Backpropagation Prediction Loss (y "B vs y_0) Student ModeA (Handles MissingInput) Transformer Encoder S_A Ground Truth RUL RULLabels Student A Prediction y "A Student Model B (Handles Missing Labels) Transformer Encoder S B Student B Prediction y_s^8 Final Output Final RUL Prediction (y_s) Ds- S₁ S2 S3 D₁ T₁ Encoder Output (shifted right) Output Embedding Input Embedding Attention Muti-Head Add & Norm Feed Forward Add & Norm Muti-Head Add & Attention Norm Encoder #N Muti-Head Add & Attention Norm Feed Forward Add & Norm Decoder #N Linear T₁ S₁ T₁ S₁ S₂ S3 Linear Кт VT Qs Vs Ks Cross Adaptive Layer Sigmoid Muti-Head Cross Attention Muti-Head Attention Add & Norm Add & Norm S₂ ypred T₁ S₁ S₁ LMMD LMSE Feed Forward Feed Forward Ldistillation S3 Ylabel Add & Norm Add & Norm ΤΙ S₁ Cross Adaptive Layer |Ltotal = arg min (WaLdistillation+ WMLMMD + W,Lregression) Fig. 6. Architecture of the proposed MSCATN.
I'm reposting my question again please make sure to avoid any copy paste from the previous answer because those answer did not satisfy or responded to the need that's why I'm asking again
The knowledge distillation part is not very clear in the diagram. Please create two new diagrams by separating the two student models:
-
First Diagram (Student A - Missing Values):
-
Clearly illustrate the student training process.
-
Show how knowledge distillation happens between the teacher and Student A.
-
Explain what the teacher teaches Student A (e.g., handling missing values) and how this teaching occurs (e.g., through logits, features, or attention).
-
-
Second Diagram (Student B - Missing Labels):
-
Similarly, detail the training process for Student B.
-
Clarify how knowledge distillation works between the teacher and Student B.
-
Specify what the teacher teaches Student B (e.g., dealing with missing labels) and how the knowledge is transferred.
-
Since these are two distinct challenges (missing values vs. missing labels), they should not be combined in the same diagram. Instead, create two separate diagrams for clarity.
For reference, I will attach a second image (architecture of the proposed MSCATNN) as an example of the level of detail I expect for both cases (Student A and Student B).



Step by step
Solved in 2 steps







