Unix Technical Forum

loadlever: job stuck in "(alloc)" state, won't run

This is a discussion on loadlever: job stuck in "(alloc)" state, won't run within the AIX Operating System forums, part of the Unix Operating Systems category; --> I just set up the following class: com_rg4: type = class # class for medium jobs priority = 60 ...


Go Back   Unix Technical Forum > Unix Operating Systems > AIX Operating System

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-04-2008, 10:47 PM
Dan Stromberg
 
Posts: n/a
Default loadlever: job stuck in "(alloc)" state, won't run


I just set up the following class:

com_rg4: type = class # class for medium jobs
priority = 60 # ClassSysprio
# cpu_limit= 08:00:00 # 2 hour run time limit
wall_clock_limit = 00:10:00 # Needed for BACKFILL scheduler
max_processors = 4 # default max processors for class (no limit)
max_total_tasks = 4

In Loadl_config.local, I have:

CLASS = small(8) medium(5) large(2) inter_class(8) all_spec(0) com_rg8(0)
com_rg32(0) com_sb8(0) com_sb32(0) com_rg4(4) com_sb4(4)

I submitted some test jobs with the following stanza:

#@ job_name = tstclm01
#@ class = com_rg4
#@ node = 1
#@ tasks_per_node = 4
#@ output = $(job_name).txt
#@ error = $(job_name).txt
#@ job_type = parallel
#@ network.MPI = csss,shared,us
#@ node_usage = not_shared
#@ account_no = 36271012
## @ wall_clock_limit = 3800
#@ queue


However, llq reports:

bash-2.05b$ llq
Id Owner Submitted ST PRI Class Running On
------------------------ ---------- ----------- -- --- ------------ -----------
esmf04m.498.0 zender 6/13 18:53 R 50 com_rg32 esmf08m
esmf04m.485.0 strombrg 6/10 16:55 I 50 all_sp8
esmf04m.529.0 zender 6/14 23:33 I 50 com_rg32
esmf04m.540.0 testacct 6/15 11:23 I 50 com_rg4 (alloc)
esmf04m.541.0 testacct 6/15 11:23 I 50 com_rg4 (alloc)
esmf04m.542.0 testacct 6/15 11:23 I 50 com_rg4 (alloc)
esmf04m.543.0 testacct 6/15 11:23 I 50 com_rg4 (alloc)

7 job step(s) in queue, 6 waiting, 0 pending, 1 running, 0 held, 0 preempted
bash-2.05b$


....and llq -s 540 reports:

=============== Job Step esmf04m.540.0 ===============
Job Step Id: esmf04m.540.0
Job Name: tstclm01
Step Name: 0
Structure Version: 10
Owner: testacct
Queue Date: Tue Jun 15 11:23:30 PDT 2004
Status: Idle
Execution Factor: 1
Dispatch Time:
Completion Date:
Completion Code:
User Priority: 50
user_sysprio: 0
class_sysprio: 0
group_sysprio: 0
System Priority: -412109
q_sysprio: -412109
Notifications: Complete
Virtual Image Size: 15 kb
Large Page: N
Checkpointable: no
Ckpt Start Time:
Good Ckpt Time/Date:
Ckpt Elapse Time: 0 seconds
Fail Ckpt Time/Date:
Ckpt Accum Time: 0 seconds
Checkpoint File:
Restart From Ckpt: no
Restart Same Nodes: no
Restart: yes
Hold Job Until:
Env:
In: /dev/null
Out: tstclm01.txt
Err: tstclm01.txt
Initial Working Dir: /u/strombrg/clm
Dependency:
Resources:
Step Type: General Parallel
Node Usage: not_shared
Submitting Host: esmf04m
Notify User: testacct@esmf04m
Shell: /usr/local/bin/bash
LoadLeveler Group: No_Group
Class: com_rg4
Ckpt Hard Limit: undefined
Ckpt Soft Limit: undefined
Cpu Hard Limit: undefined
Cpu Soft Limit: undefined
Data Hard Limit: undefined
Data Soft Limit: undefined
Core Hard Limit: undefined
Core Soft Limit: undefined
File Hard Limit: undefined
File Soft Limit: undefined
Stack Hard Limit: undefined
Stack Soft Limit: undefined
Rss Hard Limit: undefined
Rss Soft Limit: undefined
Step Cpu Hard Limit: undefined
Step Cpu Soft Limit: undefined
Wall Clk Hard Limit: 00:10:00 (600 seconds)
Wall Clk Soft Limit: undefined
Comment:
Account: 36271012
Unix Group: franklin
NQS Submit Queue:
NQS Query Queues:
Negotiator Messages:
Adapter Requirement: (csss,MPI,shared,US)
Step Cpus: 0
Step Virtual Memory: 0.000 mb
Step Real Memory: 0.000 mb
Step Adapter Memory: 0 bytes
--------------------------------------------------------------------------------
Node
----

Name :
Requirements : (Arch == "R6000") && (OpSys == "AIX51")
Preferences :
Node minimum : 1
Node maximum : 1
Node actual : 0
Allocated Hosts :

Master Task
-----------

Executable : /u/strombrg/clm/clm.sh
Exec Args :
Num Task Inst:
Task Instance:

Task
----

Num Task Inst:
Task Instance:


==================== EVALUATIONS FOR JOB STEP esmf04m.540.0 ====================

SUMMARY

This LoadLeveler cluster has sufficient resources to run this job step.
Dynamic constraints and other scheduling requirements may prevent the job step from running at the present time.

ANALYSIS

Basic Requirements :

Class : com_rg4
Machine : (Arch == "R6000") && (OpSys == "AIX51")
Network/Adapter : (csss,MPI,shared,US)
Consumable Resource :

Requirements of Node Type 0 :

Minimum Instance(s) : 1
Number of Initiator(s)/Task(s) : 4

Status of machines in the LoadLeveler cluster:

The following machine(s) can be assigned to Node Type 0.

esmf04m

The following machines are unable to meet the Basic Requirements.

esmf08m : class = com_rg4 is not supported by this machine.
esmf07m : class = com_rg4 is not supported by this machine.
esmf06m : class = com_rg4 is not supported by this machine.
esmf05m : class = com_rg4 is not supported by this machine.
esmf03m : class = com_rg4 is not supported by this machine.
esmf02m : class = com_rg4 is not supported by this machine.
esmf01m : class = com_rg4 is not supported by this machine.
bash-2.05b$


What do I need to do to get these jobs out of (alloc) state?

Google web and google groups turned up nothing.

Thanks in advance.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:51 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com